Release Notes

Stay current with HEDDA.IO. Our Release Notes summarize new functionality, performance improvements, and
fixes – along with the impact on your day-to-day work. Browse by version to see what’s shipped and what’s changed.

2.2.0

New Features

Apache Kafka

  • Added support for Apache Kafka as a new alert sink

Global AlertSinks

  • Global alert sinks can now be configured via appsettings with black- & whitelisted events
  • Supported events:
    • ProjectCreated, ProjectDeleted
    • KnowledgeBaseAdded, KnowledgeBaseDeleted, KnowledgeBasePublished, KnowledgeBaseUpdated, KnowledgeBaseVersionCreated
    • DataLinkCreated, DataLinkUpdated
    • RunCreated, RunUpdated
    • ExecutionFinished, RulebookFailed, BusinessRuleFailed, DatasetRuleFailed
    • ManualTrigger
  • Wildcards * (e.g. Project*, *) are supported
  • Configuration options will be in the devops configuration documentation

New Features

Global AlertSinks

  • Global alert sinks can now be configured via appsettings with black- & whitelisted events
  • Supported events:
    • ProjectCreated, ProjectDeleted
    • KnowledgeBaseAdded, KnowledgeBaseDeleted, KnowledgeBasePublished, KnowledgeBaseUpdated, KnowledgeBaseVersionCreated
    • DataLinkCreated, DataLinkUpdated
    • RunCreated, RunUpdated
    • ExecutionFinished, RulebookFailed, BusinessRuleFailed, DatasetRuleFailed
    • ManualTrigger
  • Wildcards * (e.g. Project*, *) are supported
  • Configuration options will be in the devops configuration documentation

Improvements

  • Monitoring Hub has got a new Filter for Projects
  • Monitoring Hub now shows the latest Executions by default
  • Code Examples were moved from Run Details to the Knowledge Base Header
  • Added Code Examples for C# Single Row Processing and cURL Single Row Processing

Bugfixes

  • Fixed the width of Monitoring Hub Statistics Cards.
  • Knowledge Base Import: MsSql Imports the correct precision for number fields
  • Knowledge Base Import: on the step Select Knowledge Base, filtered lists now select the correct items when clicked.
  • Add External Connection: Fixed Path error after switching Provider from Filesource (E.g. Onelake, ADLS) to direct (e.g. Fabric Lakehouse, MsSql).
  • Removed Drag icon from Dataset Rule Navigation
  • Fixed an issue in the InterOp Runner when processing to large datasets
  • Fixed an issue when starting a new Single Row Processing Execution
  • Fixed an error that would sometimes appear when publishing a Knowledge Base after deleting and adding Business Rules with the same name.

Breaking Changes

In this Release we decided to improve the Server Landscape, this leads to a couple of major changes in the Setup and Operation of HEDDA.IO.

API Server

We removed the API Server from the Server Landscape. All tasks the API Server handled are now part of the Frontend Server, now referenced as the HEDDA.IO Server. This helps to keep the cost of running and maintaining HEDDA.IO down and reduces the need for duplicating logic.

Note: This requires that the Pyrunner and DotnetRunner now need to be setup with the Frontend URL.

// Prior:
var hedda = Hedda.Create(apiUrl, ...);

// Now:
var hedda = Hedda.Create(frontendUrl, ...);
# Prior:
hedda = Hedda.create(base_url=api_url, ...)

# Now:
hedda = Hedda.create(base_url=frontend_url, ...)

Container App Job / WebRunner

We also switched the design of the Container App Job to be a Full Web Service instead of just a Worker Process. This leads to some other advantages. With now a dedicated Server handling the execution of Validations the HEDDA.IO Server (former Frontend) is now able to utilize it for Preview functionality, as well. Leading to a cost reduction as there is now only the need for one heavy Server handling validation requests instead of two. For it being a Server startup times, of previously around 15sec for external Executions are a thing of the past. Other benefits are that the Executions might also be scaled horizontally if high frequency of parallel executions are necessary.

Edit Version Executions

In the past it was possible to start a validation against a current Edit Version, this is still possible, but with the introduction of named Edit Version the call for this Execution changed slightly.

DotNetRunner

// Prior:
var hedda = Hedda.Create(..., workflowConfig: new(){
  ExecuteAgainstDev= true
})

// Now:
hedda.UseKnowledgeBase(kbName, versionName)

PyRunner

# Prior:
hedda.use_knowledge_base(kb_name, use_current=False)

# Now:
hedda.use_knowledge_base(kb_name, version_name=version_name)

New Features

Git Integration

You can now store and manage your Knowledge Base content directly in a Git repository! This new feature allows you to:

  • Version Control: Track changes to your Knowledge Base and easily revert to previous versions.
  • Edit Versioning: Multiple users can now work on the same Knowledge Base simultaneously, each on their own version (branch) without impacting each other.
  • Multi Environments: You can now configure different Git branches to serve as your Knowledge Bases for different environments (e.g., Development, Staging, Production). This allows for safe testing and deployment of updates before they go live to your users.
  • Automation: Integrate your Knowledge Bases into your CI/CD pipelines for automated updates and deployments.

Improvements

Single Row Processing

We integrated the Single Row Processing in the new Web Runner, eliminating the need for the additional Azure function. With the new Architecture we were no longer bound to the stateless nature of Azure functions and could introduce the implementation as a Background Service also eliminating the need for the Mongo Db used for State Caching. As bonus the performance was also increased significantly.

There is no need to change previous Pipelines Calls as the API state basically is the same. But we added two functionalities.

  1. Bring Your Own ExecutionId: The ExecutionStart call can now also have the X-Execution-Id Header which will be used to create the Execution.

    Keep in mind that this should no collide with existing Executions

  2. If the ExecutionStart Option DisableRowStorage is not set explicitly to true the processed rows will be gathered and are then also available in the Review functionality, later.

Note: The SPR and the Preview functionality are now sharing the same Instance. Keep this in mind when determining the necessary scale of the WebRunner Service

The Dotnet Runner was also extended and can be run directly in Single Row Processing mode enabling it to instantiate once and validate Data as they come in.

var hedda = hedda.Create(frontendUrl, apiKey)
  .UseProject(projectName)
  .UseKnowledgeBase(kbName)
  .UseRun(runName)
  .StartSingle();

var rowResult1 = hedda.ProcessRow([col1, col2, col3]);
var rowResult2 = hedda.ProcessRow([col1, col2, col3]);

var heddaResult = hedda.Finish();

Additionally a ProcessRow Verbose Endpoint /api/processRow/verbose was added which changes the Response slightly to include the Names of the Columns-, Rulebooks- and BusinessRule Results. Instead of the values in an Array the Values are now in an Object with the Keys being the Name of the entity.

Examples:

{
  "RowIndex": 1,
  "Valid": true,
  "Error": null,
  "ColumnResults": {
    "Sales Order ID": {
      "Value": "1ed99882-9d6e-4b5c-b6e2-50b00c987573",
      "OriginalValue": "1ed99882-9d6e-4b5c-b6e2-50b00c987573",
      "PhoneticValue": null,
      "MemberResult": null
    },
    "Customer Name": {
      "Value": "Travis Ryan",
      "OriginalValue": "Travis Ryan",
      "PhoneticValue": null,
      "MemberResult": null
    }
  },
  "RulebookResults": {
    "Product validation": true,
    "Sales Data Check": false
  },
  "BusinessRuleResults": {
    "DataType Validation": true,
    "Member Validation": true
  }
}

Search

We have made significant updates to our global search functionality. The new Full Text Search now includes descriptions, enabling you to find relevant results with greater accuracy.

The Search UI has also been redesigned for improved usability and navigation. Additionally, definitions stored in Git repositories are indexed too.

Key improvements include:

  • Enhanced search logic for more accurate results
  • Improved user interface for streamlined searching
  • Increased search scope with inclusion of definitions from Git repositories

PyRunner

The PyRunner was updated and the pure spark executor was removed the new default executor is the Interop executor which allows for better feature parity with the Dotnet Runner as it is basically utilized under the hood. The Execution performance could be improved vastly with this approach. The external executor is still available and has also received some updates like showing the current execution state and improved error handling.

Some improvements in the interop executor were made in the IPC layer, Utilizing Apache Arrow as transmission format is reducing the communication time by ~20% in the current Spark version 3.5. Coming with Spark 4.0 (currently in Preview) this will get an impressive ~80% boost out of the box. To activate this feature, the Config Option USE_PYARROW_IPC needs to be set to True.

There is only one caveat that PyArrow is not available in every Spark Environment. So far we have identified that it is available in Spark native Environments but not in Spark Connect Environments like Databricks Serverless and Databricks Shared Compute. Databricks Personal Compute and Azure Fabric are supported.

Benchmarked Performance with 4.8M rows (just IPC):

Type time
PyArrow Off 120sec
PyArrow On (Spark 3.5) 86sec
PyArrow On (Spark 4.0) 13sec

Miscellaneous

  • Improved the performance of the Audit queries
  • Expanded Domains used in Business Rules to also check Preparations and Actions
  • During import of Knowledge Bases, a connection is tested before it is saved or data are loaded
  • Improved Environments forms in Excel-Addin and added a button that opens the respective HEDDA Environment
  • Alphabetically ordered Member Search Algorithms, Business Rule Actions and Conditions
  • Added possibility to order Projects by Name, Owner or Creation Date on the Project Overview Page
  • Monitored Run Executions can now be filtered to only show the most recent or last Run Execution
  • Monitored Run Executions can now be shown as aggregated statistic tiles
  • Add/Edit Connection forms now show the currently selected file
  • Added Member Validation and Data Type Validation Filter for Business Rule Filter Options in Preview Page
  • PyRunner External Executions are no longer polling for the Result File in the data lake but actually the new WebRunner Service, leading to no more infinite-waiting if execution failed fatally
  • It is now possible to create Connections for Data Links while importing a Knowledge Base
  • HEDDA.IO + Excel Add-In User Manuals are now available for download on the HELP page under Documentation.
  • Improved memory footprint in WebRunner and PreviewRunner

Bugfixes

  • Removed Variable Domains from Default Run Mapping
  • Fixed a pagination bug in Audit Portal in which the user could set a page lower than 1
  • Fixed a bug where the Favorite Button would delete the wrong Favorite
  • Fixed a bug where deleting an object would delete all Favorites
  • Cell colorization for boolean values in the Excel-Add-In now correctly works independently from the users language settings
  • Prevented saving synonyms with empty values in the Member Drawer
  • Knowledge Base Import from Provider forms now reset on selecting another provider
  • Prevented self-looping Business Rules in Advanced Flow Editor
  • Some UI text overflow improvements
  • Fixed an error that occurred on save, when replacing the Root Business Rule in Advanced Flow Editor
  • Fixed validity icons in Rulebook Flow Display for not executed Rulebooks/Business Rules
  • Disallowed selection of a folder, where a file should be selected in Add/Edit Connection forms
  • Selecting data from a Range that intersects a table in Excel-Addin, now correctly uses selected data instead of the whole table
  • Fixed the Query for items with Data Type Validation Issues in Preview
  • Fixed an issue when validating Data Time columns in Single Row Processing

New Features

Databricks Project Service Principal Authentication

It is now possible to create a Databricks Connection with the configured Project Service Principal.

Project API Key

Similar to User API Keys, Projects can now have API Keys as well. They can be found on the Project dashboard under details below the Project Service Principal.

To create and manage the Project API Key, the User has to have manage permissions for the Project.

A Project API Key can then be used identically to a User API Key. Its permissions are limited to Read/Write for the project by default.

Welcome Page List View

A list view was added to the welcome page as alternative to the current card view of projects. It can be seen using the list icon in the top right of the page.

An Add Project button was added to the top right of the page. It works identically to the Add Project card.

Excel Add-in

The Excel Add-in for HEDDA.IO enables the import of Execution results from any HEDDA.IO instance into Microsoft Excel. Authentication to your HEDDA.IO instance is possible by SSO and API key.

Execution results can be narrowed down by choosing only valid or invalid data sets and the advanced filter option adds additional customizablity to what a result set should include:

  • Include Row Id (Include a Row ID column. This enables the display of Row Details when clicking on a row.)
  • Include Row Validity (Include a color coded column, that shows the validity of a row.)
  • Include Originals (Include columns containing the original values of domains.)
  • Include Rulebooks
  • Include Business Rules
  • Include Variable Domains
  • Include Data Type Validity
  • Include Member Search Validity

More information can be found in the Hedda Excel Add-in documentation.

Improvements

  • Improved loading of Preview Data.
  • Improved Condition Form to support drag and drop in moving BusinessRules Condition between SubConditions

Bugfixes

  • Fixed some issues where deleting an object would result in a message that no entries in a Sequence are available
  • Fixed an issue on Analyze Result Screen where an error would be shown on a race condition.
  • Using a User as Project Owner who wasn’t used anywhere else in Hedda before won’t throw an error anymore.
  • When adding or editing External Connections, changing to a non existent Project Service Principal won’t show an error toast anymore.

New Features

Current Date Preparation

A new Preparation Option was introduced which enables an easier retrieval of different Dates based on the Current Date. The following Dates will be provided as UTC Dates:

  • Now
  • Start of today
  • End of today
  • Start of current month
  • End of current month
  • Start of current year
  • End of current year

Project Service Principals

It is now possible to add an Azure Service Principal for each Project separately. The following Transports have the possibility to utilize it:

  • Microsoft Fabric Lakehouse
  • Azure Data Lake Storage
  • Azure Blob Storage
  • Microsoft SQL Server
  • Microsoft OneLake

Microsoft OneLake

Microsoft OneLake has been added as a new Connection Type.
Utilizing a Microsoft OneLake Connection together with Microsoft OneLake Shortcuts makes it possible
to import files from supported 3rd party vendors into HEDDA.IO. Currently these are AWS, Dataverse
and Google Cloud Storage.

Reference Controller – OAuth Details

Added OAuth Details to Reference Controller for Excel Addin OAuth Authentication Workflow. This also requires the new Setting AppScopes under AzureAd in the appsettings or in the env variables AzureAd__AppScopes.

Improvements

  • The Preview Reader Dropdown will be hidden when a Reader is configured that is always running.
  • Increased performance for some Database queries
  • Added Options to Preview Corrected Filter where the Filter can be switched quickly to include all Domains or All Domains without Variables.
  • Improved Domain Selection in Preview Result
  • Added RetryLogic for 429 Exceptions when starting Container App Job
  • Optimized Data Preparation Step for Larger Datasets

Bugfixes

  • Fixed an error with trailing slashes at the end of a Databricks Connection host name
  • Added missing confirmation on DataLink deletion
  • Fixed an issue that caused items in the navigation list not being highlighted anymore
  • Disallowed empty values for API Expiration Date Dropdown
  • Fixed an issue where some tag changes were not displayed in publish dialog
  • Fixed links in tags detail to runs and rulebooks
  • Removed unlink option for tag detail, when kb is not in edit mode
  • Fixed an issue where execution information were loaded multiple times.
  • Fixed a possible race condition for external .Net Runner
  • Fixed issue when selecting tag filter in rulebook page
  • Fixed an issue where the Preview count is not correctly calculated if Corrected Filter is selected
  • Fixed an issue in DuckDb Preview Reader where Corrected value is not found if Original was NULL
  • Fixed an issue when discarding Knowledge Base changes with deleted Rulebook, preventing later publishing
  • Fixed a potential issue when publishing a Knowledge Base that has a deleted Rulebook with the same name as a newly generated Rulebook

New Features

Preview Chaining

Preview Chaining is a new feature, that enables the use of Execution results from a previous Run as the Data Source for a new Run. Preview Chaining lets you choose between the Live and Edit Version of a Knowledge Base as well as filtering for valid or invalid Execution results. It is incorporated into the existing Preview Button as a new Tab called Execution and is currently only usable via the HEDDA.IO web interface.

Azure Data Lake Storage

Azure Data Lake Storage has been added as a new Connection Type for ADLS Gen2 enabled Azure Storage Accounts.

Improvements

Resilience Options

External Runner

Added retry policy configuration options to the HEDDA External Runner:

{
  "ApiRetryAttempts": 3, //Amount of retry attempts to HEDDA API (Only applies to request timeouts) (Default = 3)
  "ApiMaxTimeoutS": 180  //Max wait time before cancelling a request to HEDDA API (Default = 180)
}

These options only apply to requests made to the HEDDA API.

Databricks Transport

Added an option to Databricks Connections for setting a maximum request timeout. The default value is 100 seconds.

Miscellaneous

  • Added Last Modified Information for Data Links
  • Added better Logging to Api Project
  • Added possibility to add Details to Executions
    • Added UseDetails to .net Runner Definition, so that additional Information can be stored alongside the execution
  • Added an Option to change the DuckDbPackageSource for System which are behind a Firewall Preview__Reader__ExtensionPath
  • Improved DataUpload performance and size limitations
  • Increased performance for some Database queries

Bugfixes

  • Fixed an issue writing back the Data with some unexpected DataTypes
  • Fixed toggle alignment in Knowledge Base Export Drawer
  • Fixed undesired behavior when copying an existing Domain Mapping and changing its values
  • Fixed an issue where it was randomly not possible to upload the Result Parquetfile in dotnet runner.
  • Improved error feedback when using the External Connections Test button
  • Fixed Rulebook Search being Case Sensitive
  • Null/Undefined values in the Lookup Preview table will now be displayed as <NULL>
  • Fixed Execution Statistics not being created in some cases
  • Fixed links to Business Rules from the Knowledge Base Dashboard Overview section
  • Fixed an issue where the Run Page would still display the old Mapping after changing it on a Run
  • Fixed an issue where a Knowledge Base could not be deleted
  • Fixed Lookup Preview table overflow

Package Updates

Package New Old
DuckDB.NET.Data.Full 1.0.0 0.10.2
Azure.Identity 1.11.4 1.10.4
Hedda.io_primarylogo_orange_white_text

HEDDA.IO is a modern data quality platform that transforms domain knowledge into automated, scalable data validation and governance.

Contact us

A product by

oh22_Logo_weiss_RGB