Data quality is a pressing concern for organisations, but what exactly does it entail, and how can you measure it effectively? Furthermore, identifying data issues is just the beginning – the real challenge lies in automating their resolution within your data preparation processes.
Omniscope offers a comprehensive solution by enabling the creation of multiple data validation rules and automating transformation processes within a unified environment.
By maintaining and adjusting the data workflow logic over time, you will ensure the data quality model can identify and remedy any anomalies, and that your data is transformed in a consistent and transparent way.
In this article, we will look at:
- Data type and format violations
- Data value issues: duplications, missing values, or data that is out of range
- Inconsistencies in capitalisation and spelling
- Identifying anomalies – visual, mechanical, or analytical approaches
- Rule violation alerts
Diagnostic phase
Every block in the Omniscope workflow area allows you to preview the data (even before the entire dataset is loaded), sort it by clicking on the column header, and view its value distribution in a preview chart for each individual field.
This helps you monitor the effects of transformations at every step of the process.
We would recommend starting your data modelling with visual discovery and exploring the dataset in its raw state before working on it.
Data profiler - visual data exploration for quick diagnostics
Connect your data source to a Report block (be sure to select the 'Data Discovery' report template) to instantly access a combination of macro and micro insights: the Fields View data profiler, complemented by filters and a granular data table.
Let the exploration begin!
Zooming in and out of the dataset and focusing on a particular subset is easy using filtering, selection, and the move/keep buttons. Navigate quickly through the data to decide which fields and values require attention. Your findings should help determine which operations are needed in the prep phase to address any issues identified during exploration.
Plug the data into this same Report after transformation to check if you're happy with the results!
Data type checks
The Field Organiser block lists all the fields in your dataset and allows you to rename them (as shown in the image above), change the data type or format (e.g. switch the date display from dd/MM/yyyy to M-d-yy), duplicate fields, add new fields containing formulae, or remove columns that are not relevant to your analysis. If your data is consistent (always containing the same fields and format) and requires the same intervention each time, this block will automatically detect and resolve these issues.
For auditing purposes, all changes are highlighted, allowing you to easily track schema modifications and reverse changes, including field deletions.
The Bulk Field Organiser block is particularly useful for renaming multiple fields and making bulk changes to data types or formats, especially when working with datasets containing tens or hundreds of fields. For example, you can change the format of all fields containing 'date' in the field name from dd-MM-yyyy to MM-dd-yy.
Another useful diagnostic tool is the Field Metadata block, which scans all fields and provides key statistics (see below). You can also merge metadata from two data sources to compare them before combining them.
Data schema and data range/value checks
The Validate Data block is a data quality superstar—it allows you to introduce multiple screening rules and evaluate incoming data against a predefined schema. This schema checks:
- Field names, field count, and data type for each field
- Cell values (e.g. ensuring that percentage values are less than 101)
- Record count ( e.g. if your dataset should not contain more or less than N rows)
The fastest way to set up this block is by connecting it to a model dataset with a correct data schema and clicking the option to "Reset to input". This primes the block to receive new batches of data and compare them against the defined rules. It will alert you if any fields are missing or flag unexpected fields or formats in the incoming dataset.
Within the same block, you can set rules for cell value validation and record count checks. For example, after a join, you might expect the number of mismatches to be zero. By adding a Validate Data block, you can be alerted to the appearance of unexpected records—either due to data errors or because the mapping file needs an update.
Alerts and Actions
The Validate Data block includes settings that allow you to:
Set warnings – The data will flow through, but you’ll receive a detailed diagnostics report.
Create an error – If the data fails the rules, it will not be allowed to pass.
Trigger an email alert – Notify relevant parties to take action.
With these capabilities, the Validate Data block ensures data integrity and helps maintain a clean, structured workflow.
The Validate Data block is useful for catching inconsistencies in your data, especially when dealing with sources that vary between batches—for example, manually edited or authored data.
Important: If you set this block to Warning mode, the Data tab will list all issues found in the dataset. This is particularly useful when multiple issues need to be addressed.
You can switch between Error and Warning mode while working on the remedy. Error mode stops the data from passing through the block, whereas Warning mode allows the data to flow while generating a detailed error report.
Duplicated values
Omniscope offers several methods to help you manage data duplications.
The De-duplicate block identifies and removes duplicate records (e.g. when multiple rows share the same CustomerID but have different name or surname spellings). This block allows you to specify which fields are relevant when flagging a record as a duplicate—ensuring that it's not just about having identical rows.
If you want to investigate the causes of duplication and explore variations, you can use the Frequency Table block (from the Analytics section) or the Aggregate block. In both cases, you can combine multiple fields to define what constitutes a unique key for your dataset, then easily sort and filter records where frequency > 1.
In the Aggregate block, you can create a Split on a unique field (e.g. Country ISO or CustomerID) and, on the Results side, apply functions like Unique List or Unique Value Count to identify instances where multiple country or customer name spellings exist (see image below).
Inconsistent capitalisation
The Normalise Case block ensures consistency in the use of upper and lower case letters and automates text corrections. By default, Omniscope will scan each dataset and alert you if this block is needed. You can choose to ignore the warning, as sometimes variations are meaningful or not relevant for the analysis.
Missing data
The data preview chart in each workflow block provides an update on how many missing values are found in each field. A quick sort (simply click on the column header) will list records with missing values at the top, making it easy to inspect them.
In cases where you have values such as "n/a", codes like "999999", or blank cells, you can add a Search/Replace block to enforce uniformity and populate the cells with another value.
In some datasets (e.g. log files or digital media files), an empty cell is not considered missing data—it is assumed that each empty cell takes the value of the last populated cell above it. In this case, you can add a Fill Down block to fill in the blanks for all or just some of the fields.
Empty rows and fields
Some data input files can contain empty rows or fields, which can increase your data size and cause issues when aggregating, visualising, or merging with other datasets. Two blanks are a match!
As the name suggests, the Delete Empty Data block will find and remove any rows or columns that are completely empty.
Questionable values - using smart analytics to catch them
Sometimes it can be difficult to spot an outlier just by sorting, filtering or visualising the data. In some cases it is the combination of values in multiple fields that makes the record an outlier. You could automate the process by applying Anomaly Detection and Reduce Dimensions from our Analytics library, or by using your own R and Python code in the Custom block.
Here is an article explaining the process on an example.
Where the data irregularities can be spotted by analysing correlation between the fields you could use the Covariance / Correlation block.
Note: Although Omniscope doesn't automatically generate data quality scores - you could easily define your own KPIs in a Field organiser block, or in a Report block's Content view, by adding a formula and formatting the result as %, such as:
SUBSET_UNIQUECOUNT([Field name])
/
ALLRECORDCOUNT()
By building your own workflow logic and applying the relevant methodology, you will ensure that only clean, consistently prepared data reaches the reports. This gives you full confidence that your decisions will be based on accurate data. Since all steps are pre-defined and transformations are automated (especially if you’re using the Scheduler app), the transition time from raw data to ready data is reduced to just a few seconds, ensuring your team has access to up-to-date information anytime and on any device.
Attached below is a demo file that combines many of the automated data cleaning methods mentioned in this article (IOZ attachment).
Can any automated ETL process catch and fix 100% of the data issues?
The answer depends on the data origins and the nature of the anomalies (are they consistent across batches, or do they vary each time?). In some cases, a degree of manual intervention may be required, and Omniscope can manage this in two ways:
- Data export of 'problematic data points' for external manual intervention
- Allowing viewing/editing within the project (via the Data Table block in the workflow or Table View in the Report block).
The green block below represents a file output containing mismatched data, which is exported, inspected, and edited manually outside of Omniscope. Edited data with corrections can then be imported back and integrated into the transformation logic.
Different methods for data editing inside Omniscope can be found in this article on our Knowledge Base.
Was this article helpful?
That’s Great!
Thank you for your feedback
Sorry! We couldn't be helpful
Thank you for your feedback
Feedback sent
We appreciate your effort and will try to fix the article