Published On: September 4th, 2024Categories: Blog

Result Analysis in HEDDA.IO

HEDDA.IO offers comprehensive Business Rules for validating and correcting data. These rules vary from very simple to rather complex procedures that are defined in HEDDA.IO ‘s Rule Books. The standard output of HEDDA.IO provides detailed information and statistics on the individual rules, allowing the user to directly obtain information on how good or bad the quality of the data to be checked is for each execution carried out and which rules have performed to what extent.

After the review, the user always has the option within the notebook of making decisions based on the results for the individual Business Rules, saving data in different “pots” or processing it further in different pipelines.

In some situations, however, it can be relevant not only to look at the results, but also to delve deeper into the data and understand the validation results at line level. HEDDA.IO offers extensive options for this via the “Analyze Result Set” function. This function is available to the user if they have set the enable_data_upload() option within a run or have validated data within the HEDDA.IO UI using the preview function. In these two modes, HEDDA.IO saves the complete results in HEDDA.IO‘s own data lake and makes the data available to the user for detailed analysis.
.

HEDDA.IO Example COVID-19

 

.
Access from HEDDA.IO to the data can be configured either using DuckDB or a Trino instance. This allows HEDDA.IO to offer optimum scaling of the function from small to very large data volumes.



The
analysis function offers the user the option of filtering data according to individual domains, business rules, valid, invalid, corrected and much more. Within the individual data record, the user is shown directly which domain may have been changed by the processing of HEDDA.IO.

Enable data upload for deeper analysis in HEDDA.IO IU
Preview COVID-19



For an even deeper dive into the analysis, the user can open an individual data record and now trace the complete rule flow for each individual rule book and thus directly recognize which rule was applied to a data record, which was
not or at which point the rule flow for a data record was terminated.


HEDDA.IO Row Result

 

.
With these functions, HEDDA.IO provides users with an optimal, highly detailed function to fully understand the processing of business rules.

 

DuckDB is an innovative, in-process SQL database that is characterized by its high performance and efficiency. Designed for analytical workloads, DuckDB offers easy integration into existing applications without the need for a separate database server. It is particularly well suited for embedded analytics and offers excellent query performance by utilizing modern CPU architectures. With its flexible and lightweight architecture, DuckDB enables fast and efficient data analysis directly within applications, making it an ideal choice for developers and data analysts.

https://duckdb.org/

Trino, formerly known as PrestoSQL, is a powerful, distributed SQL query engine designed for fast and scalable analysis of data across multiple data sources. With Trino, users can run complex queries on large amounts of data without having to move the data to a central repository first. The engine supports a variety of data sources such as HDFS, S3, RDBMS and NoSQL databases, making it extremely flexible and versatile. Trino is characterized by its high performance and scalability, making it ideal for big data environments and data analytics applications.

https://trino.io/

.

Tillmann Eitelberg

Tillmann Eitelberg is CEO and co-founder of oh22information services GmbH, which specializes in data management and data governance and offers its own cloud born data quality solution, HEDDA.IO.

Tillmann is a regular speaker at international conferences and an active blogger and podcaster at DECOMPOSE.IO.

He has open sourced several SSIS components and is Co-Author of Power BI for Dummies (German Edition). Since 2013 is Tillmann is awarded as Microsoft Data Platform MVP. He is a user group leader for the PASS Germany RG Rheinland (Cologne) and was a member of the Microsoft Azure Data Community Advisory Board.

WE CREATE

CLEAN DATA EVERY DAY.

GET STARTED