Data storage for blocks in Omniscope workflows

Modified on Thu, 28 Mar, 2024 at 10:41 AM

Omniscope Evo comes with a wide array of blocks in the Workflow which let you retrieve, transform and publish data. Some of those blocks are used to store or snapshot data, and to help with any possible confusion, this article explains all.

These apply to Source and Operation blocks which emit data, but not Output (publishing) blocks, which do not emit data or store data themselves.

General cases

First, let's talk about how storage works for general cases, where Omniscope stores data for arbitrary blocks within each project's ".data" folder adjacent to the project itself.

Data storage for arbitrary blocks

When you execute a workflow or part of a workflow, Omniscope stores data temporarily for the last block(s) in the chain. This is to let you easily build up a workflow by adding blocks, configuring, executing, checking the results in the block's Data tab, then rinse and repeat. Unless this data is small or otherwise retained, it will be erased after 15 minutes or soon after.

You can tell when a block has full execution data stored and available to review, because the block will appear "full" (strongly coloured) rather than "empty" (faintly coloured):

When a block is in the middle of workflow, as a data transformation step, and the data is to big for Omniscope to retain in full, Omniscope just keeps a random sample. If you open the Data tab you'll see a watermark on the data saying that large data is kept in full only when a block is directly linked to a Report block

For example the workflow here has been fully executed, but the Aggregate 2 block is not a leaf, nor a block connected to a report, but a block in the middle that just transforms data.
You can read on the block output the number of fields and records produced, but in this case just a random sample is kept.

Unless otherwise included, temporary execution data is not included in IOZ files (exported projects).

Data storage for Report inputs

When a block is connected into a Report block, it becomes a Data Source for the report, and Omniscope automatically stores data indefinitely, to ensure your report always has data and can be explored (perhaps shared with other users) without needing to execute and wait. In this example, we've executed a workflow with a Report block. The upstream blocks no longer retain their (interim) data, but Omniscope will always store the data indefinitely for the penultimate block containing the report's data:

When exporting a project as an IOZ file, this data is included by default, but you can choose to omit it.

Specific blocks

Now let's talk about blocks where storage is one (at least) of the primary functions of the block.

Text Input block

The Text Input block is a mini spreadsheet-like interface for entering small data. The data is held within the block itself, as part of the Omniscope project's configuration. Useful for small datasets such as personal datasets and lookup tables.

This is a special kind of "live query" block (see the blue lightning bolt badge) that never needs to be executed, and if connected directly to a Report block, the report will immediately reflect any edits, and you can optionally use editing within the Report's Table and experimental Editable Table views to correct values, for example.

Data Table block

The Data Table block is a persistent, editable, live query snapshot of the input data. Designed for small to medium sized datasets, it is typically used when you wish to make data from an upstream executing workflow to become an editable snapshot. You might then have a downstream process to periodically save edits back to the original file or database.

As per Text Input, it is a "live query" block, so can be edited from within the report (which can be disabled).

In this example, some upstream workflow has been snapshotted in a Data table block (by selecting the Data Table block and clicking Refresh).

Here we are editing a value in the report:

And here you can see that the Data Table block is now out of sync with its input data - i.e. the snapshot has been edited and no longer matches the upstream executed data:

The Data Table block's data is stored next to the project (.iox file), in its sidecar ".data" folder, and is managed by Omniscope. If exported as an IOZ file, it will include the data.

Storage block

The Storage block is a server-wide, named, saved dataset. It can be used as a source and/or output. Think of it as a simple "managed database" where each table has a name. There's no schema, database name, catalog etc., and no database connection details to worry about. The data is managed by Omniscope, and stored within the Omniscope installation.

In these 3 examples we have the same "storage name" configured in each Storage block: "demodemo". This is a name I decided to use to uniquely identify the dataset, server-wide. Once executed, all projects in the server will be able to access this dataset using the Storage block and selecting this name from the list.

Although I'm showing 3 examples in one workflow, they could equally well be in different projects and folders on the server, all reading and/or writing to the same storage table.

The first example shows a table being populated from some upstream workflow - in this case only on first execution.

The second example shows a table being consumed by some downstream workflow.

The third example shows a Storage block inline, with data flowing through it. Whatever data arrives from upstream will pass downstream, in addition to performing the action configured in the block (fill once, refill, or append).

Savepoint block

The latest addition to the family, the Savepoint block, is a very simple minimally configured block to help you manage execution and data retention. It represents a 'savepoint' in terms of execution data, within a larger workflow.

It has 2 modes of operation:

Persistent. This serves purely to retain execution data indefinitely, without the 15 minute cleanup mentioned in "General Cases / Data storage for arbitrary blocks", above. Useful if you have a complex workflow and you are building it up over a long period of time, and want to come back to it the next day, without having to re-execute everything.
In this case, data is stored the project's adjacent ".data" folder, and managed by Omniscope.
Temporary. This servers purely to optimise a single execution of a workflow. Normally Omniscope will "stream" data in chunks from block to block, and in workflows with multiple paths, data may be processed more than once to satisfy downstream pathways (good for fast or small workflows). Using the Savepoint block in Temporary mode causes data to be captured in a temporary location and thereafter the temporary data is used downstream (good for large data volumes, where data storage is cheap and fast). When execution finishes, the data is erased. This ensures the upstream workflow does not execute more than once, and multiple pathways downstream get to reuse the same output.
In this case, data is stored in an internal temporary location managed by Omniscope.

Wrapping up

And that's it. We may expand this functionality in future (for example, to expand editing capabilities) and will update this article accordingly. If you have any feedback, please let us know at support@visokio.com.