Release Notes
Stay current with HEDDA.IO. Our Release Notes summarize new functionality, performance improvements, and
fixes – along with the impact on your day-to-day work. Browse by version to see what’s shipped and what’s changed.
* (e.g. Project*, *) are supported
* (e.g. Project*, *) are supported
In this Release we decided to improve the Server Landscape, this leads to a couple of major changes in the Setup and Operation of HEDDA.IO.
We removed the API Server from the Server Landscape. All tasks the API Server handled are now part of the Frontend Server, now referenced as the HEDDA.IO Server. This helps to keep the cost of running and maintaining HEDDA.IO down and reduces the need for duplicating logic.
Note: This requires that the Pyrunner and DotnetRunner now need to be setup with the Frontend URL.
// Prior:
var hedda = Hedda.Create(apiUrl, ...);
// Now:
var hedda = Hedda.Create(frontendUrl, ...);
# Prior:
hedda = Hedda.create(base_url=api_url, ...)
# Now:
hedda = Hedda.create(base_url=frontend_url, ...)
We also switched the design of the Container App Job to be a Full Web Service instead of just a Worker Process. This leads to some other advantages. With now a dedicated Server handling the execution of Validations the HEDDA.IO Server (former Frontend) is now able to utilize it for Preview functionality, as well. Leading to a cost reduction as there is now only the need for one heavy Server handling validation requests instead of two. For it being a Server startup times, of previously around 15sec for external Executions are a thing of the past. Other benefits are that the Executions might also be scaled horizontally if high frequency of parallel executions are necessary.
In the past it was possible to start a validation against a current Edit Version, this is still possible, but with the introduction of named Edit Version the call for this Execution changed slightly.
// Prior:
var hedda = Hedda.Create(..., workflowConfig: new(){
ExecuteAgainstDev= true
})
// Now:
hedda.UseKnowledgeBase(kbName, versionName)
# Prior:
hedda.use_knowledge_base(kb_name, use_current=False)
# Now:
hedda.use_knowledge_base(kb_name, version_name=version_name)
You can now store and manage your Knowledge Base content directly in a Git repository! This new feature allows you to:
We integrated the Single Row Processing in the new Web Runner, eliminating the need for the additional Azure function. With the new Architecture we were no longer bound to the stateless nature of Azure functions and could introduce the implementation as a Background Service also eliminating the need for the Mongo Db used for State Caching. As bonus the performance was also increased significantly.
There is no need to change previous Pipelines Calls as the API state basically is the same. But we added two functionalities.
ExecutionStart call can now also have the X-Execution-Id Header which will be used to create the Execution.
Keep in mind that this should no collide with existing Executions
DisableRowStorage is not set explicitly to true the processed rows will be gathered and are then also available in the Review functionality, later.
Note: The SPR and the Preview functionality are now sharing the same Instance. Keep this in mind when determining the necessary scale of the WebRunner Service
The Dotnet Runner was also extended and can be run directly in Single Row Processing mode enabling it to instantiate once and validate Data as they come in.
var hedda = hedda.Create(frontendUrl, apiKey)
.UseProject(projectName)
.UseKnowledgeBase(kbName)
.UseRun(runName)
.StartSingle();
var rowResult1 = hedda.ProcessRow([col1, col2, col3]);
var rowResult2 = hedda.ProcessRow([col1, col2, col3]);
var heddaResult = hedda.Finish();
Additionally a ProcessRow Verbose Endpoint /api/processRow/verbose was added which changes the Response slightly to include the Names of the Columns-,
Rulebooks- and BusinessRule Results. Instead of the values in an Array the Values are now in an Object with the Keys being the Name of the entity.
Examples:
{
"RowIndex": 1,
"Valid": true,
"Error": null,
"ColumnResults": {
"Sales Order ID": {
"Value": "1ed99882-9d6e-4b5c-b6e2-50b00c987573",
"OriginalValue": "1ed99882-9d6e-4b5c-b6e2-50b00c987573",
"PhoneticValue": null,
"MemberResult": null
},
"Customer Name": {
"Value": "Travis Ryan",
"OriginalValue": "Travis Ryan",
"PhoneticValue": null,
"MemberResult": null
}
},
"RulebookResults": {
"Product validation": true,
"Sales Data Check": false
},
"BusinessRuleResults": {
"DataType Validation": true,
"Member Validation": true
}
}
We have made significant updates to our global search functionality. The new Full Text Search now includes descriptions, enabling you to find relevant results with greater accuracy.
The Search UI has also been redesigned for improved usability and navigation. Additionally, definitions stored in Git repositories are indexed too.
Key improvements include:
The PyRunner was updated and the pure spark executor was removed the new default executor is the Interop executor which allows for better feature parity with the Dotnet Runner as it is basically utilized under the hood. The Execution performance could be improved vastly with this approach. The external executor is still available and has also received some updates like showing the current execution state and improved error handling.
Some improvements in the interop executor were made in the IPC layer, Utilizing Apache Arrow as transmission format is reducing the communication time by ~20% in the current Spark version 3.5.
Coming with Spark 4.0 (currently in Preview) this will get an impressive ~80% boost out of the box.
To activate this feature, the Config Option USE_PYARROW_IPC needs to be set to True.
There is only one caveat that PyArrow is not available in every Spark Environment. So far we have identified that it is available in Spark native Environments but not in Spark Connect Environments like Databricks Serverless and Databricks Shared Compute. Databricks Personal Compute and Azure Fabric are supported.
Benchmarked Performance with 4.8M rows (just IPC):
| Type | time |
|---|---|
| PyArrow Off | 120sec |
| PyArrow On (Spark 3.5) | 86sec |
| PyArrow On (Spark 4.0) | 13sec |
It is now possible to create a Databricks Connection with the configured Project Service Principal.
Similar to User API Keys, Projects can now have API Keys as well. They can be found on the Project dashboard under details below the Project Service Principal.
To create and manage the Project API Key, the User has to have manage permissions for the Project.
A Project API Key can then be used identically to a User API Key. Its permissions are limited to Read/Write for the project by default.
A list view was added to the welcome page as alternative to the current card view of projects. It can be seen using the list icon in the top right of the page.
An Add Project
button was added to the top right of the page. It works identically to the Add Project
card.
The Excel Add-in for HEDDA.IO enables the import of Execution results from any HEDDA.IO instance into Microsoft Excel. Authentication to your HEDDA.IO instance is possible by SSO and API key.
Execution results can be narrowed down by choosing only valid or invalid data sets and the advanced filter option adds additional customizablity to what a result set should include:
More information can be found in the Hedda Excel Add-in documentation.
A new Preparation Option was introduced which enables an easier retrieval of different Dates based on the Current Date. The following Dates will be provided as UTC Dates:
It is now possible to add an Azure Service Principal for each Project separately. The following Transports have the possibility to utilize it:
Microsoft OneLake has been added as a new Connection Type.
Utilizing a Microsoft OneLake Connection together with Microsoft OneLake Shortcuts makes it possible
to import files from supported 3rd party vendors into HEDDA.IO. Currently these are AWS, Dataverse
and Google Cloud Storage.
Added OAuth Details to Reference Controller for Excel Addin OAuth Authentication Workflow.
This also requires the new Setting AppScopes
under AzureAd in the appsettings or in the env
variables AzureAd__AppScopes.
Preview Chaining is a new feature, that enables the use of Execution results from a previous Run as the Data Source for a new Run.
Preview Chaining lets you choose between the Live and Edit Version of a Knowledge Base as well as filtering for valid or invalid Execution results.
It is incorporated into the existing Preview Button as a new Tab called Execution
and is currently only usable via the HEDDA.IO web interface.
Azure Data Lake Storage has been added as a new Connection Type for ADLS Gen2 enabled Azure Storage Accounts.
Added retry policy configuration options to the HEDDA External Runner:
{
"ApiRetryAttempts": 3, //Amount of retry attempts to HEDDA API (Only applies to request timeouts) (Default = 3)
"ApiMaxTimeoutS": 180 //Max wait time before cancelling a request to HEDDA API (Default = 180)
}
These options only apply to requests made to the HEDDA API.
Added an option to Databricks Connections for setting a maximum request timeout. The default value is 100 seconds.
UseDetails to .net Runner Definition, so that additional Information can be stored alongside the execution
Preview__Reader__ExtensionPath
<NULL>| Package | New | Old |
|---|---|---|
| DuckDB.NET.Data.Full | 1.0.0 | 0.10.2 |
| Azure.Identity | 1.11.4 | 1.10.4 |
HEDDA.IO is a modern data quality platform that transforms domain knowledge into automated, scalable data validation and governance.