Incremental Data Loading and Editing Using the Data Table Block

Modified on Fri, 16 May at 4:52 PM

Incremental data loading is a common requirement in data workflows — especially when you want to retain and edit previously loaded records while seamlessly adding new ones. With Omniscope, you can achieve this elegantly by using the Data Table block as the central source of truth for editable and accumulating data.

Workflow Pattern: Editable Incremental Loading

The workflow below shows a typical approach to incremental loading using IOD files, with the Data Table at its core:

The input stream brings in fresh batches of data (e.g. daily or scheduled updates).
Data is shaped and filtered through transformation blocks.
A Left Anti Join compares incoming data with existing data (All data) to isolate only new records.
These new records are written to an IOD Output block in append mode — building up a persistent store of all historical records.
The Data Table block then reads from this IOD file, acting as a live, editable interface.
Users can make manual edits in the UI or Report tab. The changes can be saved using a Save Edited Data block, which can be triggered from the Report view.

✅ What This Pattern Achieves

Preserves historical data: old records remain untouched.
Avoids duplication: only new records are appended.
Allows editing: the table remains editable, searchable, and visible in Reports.

✨ Now Made Simpler: Use Refill Behaviour in the Data Table

With the Refill behaviour setting now available in the Data Table block, the entire workflow can be simplified.

Instead of using an external IOD file to accumulate edited data, the Data Table block itself can now manage both incoming data and incremental logic.

Refill Behaviour Options

You can configure how the Data Table reacts to incoming data using the Refill behaviour setting:

1. Replace existing data

Overwrites the table completely with each execution.
Use this for full-refresh scenarios when you don't need to retain past data or edits.

2. Append all incoming records

Appends every incoming record to the table.
Useful when incoming data is guaranteed to be new (e.g. logs, timestamps), and no deduplication is needed.

3. Append only new records by key

Compares incoming records with existing ones using configured Record Key Fields (like a primary key).
Only records with unseen key values are added.
Existing rows (and their edits) are left untouched.

4. Append only rows with higher field values

Performs true incremental loading by appending only rows where one or more fields (e.g. timestamp, ID) have values strictly greater than what was previously loaded.
Ideal for APIs or systems where data increases over time using numerical or time-based markers.

✅ Recommended Setup

Initial Run

Set Refill behaviour to Replace existing data
→ This loads your full initial dataset.

Ongoing Runs

Switch to either:

Append all incoming records, if there's no overlap
Append only new records by key, if you want to deduplicate
Append only rows with higher field values, for strict incremental logic

This way, you build up your data over time — edits are preserved, and only genuinely new rows are appended.

⚡ Smart Deduplication with Record Key Fields

When using the Append only new records by key mode, you can configure Record Key Fields to define a unique identifier for each row.

During a refresh, if a new row has a key that already exists in the table, it will be skipped.
Existing rows — including any edits — are left intact.
This prevents duplicate records and preserves manual changes.

Use this when:

You sync from an API or file that includes previously seen data.
You want to build up your dataset gradually without repeating or overwriting.
You maintain a stable curated dataset over time.

✅ Text fields are matched case-insensitively.
❗ Incoming batch duplicates are not removed automatically — use a De-Duplicate block if needed.

➕ Incremental Loading by Increasing Field Values

In many systems (e.g. APIs, logs, data streams), each new batch contains records with values that are strictly increasing — such as:

Timestamps
Auto-increment IDs
Sequence numbers

With the Append only rows with higher field values mode, you can:

Define one or more Incremental Key Fields
Append only rows where all specified field values are higher than the maximum already stored

This gives you a true incremental update mechanism, ideal for:

Tailing logs
Pulling pages from APIs using a cursor
Avoiding reprocessing previously seen rows

Why This Matters

With these options, the Data Table block becomes more than a simple editable table — it becomes:

A live data sink that accumulates your incoming records
A curation point where you can safely review and edit data
A simplified alternative to complex file-based accumulation workflows

You no longer need complex join logic or IOD outputs to achieve incremental, editable datasets.

✅ Summary of Benefits

Preserves manual edits between runs
Avoids duplication and protects curated rows
Supports incremental updates and streaming-style workflows
Feeds live dashboards and reports without delay

Let us know if you'd like help setting this up in your own workflow or if you're migrating from an older pattern!