Published On: May 7th, 2024Categories: Blog

Member Search

A domain within HEDDA.IO features various properties. In addition to a name and a description, one of many different data types can also be defined with the corresponding specific configuration such as length or precision.

One of the standout features within HEDDA.IO is the innovative Member Search functionality. This feature enables the storage of domain values alongside assigned states, distinguishing between main values and synonyms, as well as determining their validity.

A synonym is a reference to a main value for which a record is to be corrected or standardized. Consider a practical example with the “Country” domain. This domain should contain all valid ISO2 codes, which are entered as valid main values within the member search. If, for example, the values “Deutschland”, “Germany” or “DEU” come up when checking the data from various source systems, these can be assigned to the main value in advance as synonyms. As a result, these values are automatically changed to “DE”.

The Valid or Invalid status can also determine the validity of both the synonyms and the primary value. If incorrect data, like “Sample country” or “Test,” which cannot be meaningfully assigned to a valid primary value, is obtained from any source system, it can be promptly flagged as invalid so that the check always outputs an error at this point. Data that cannot be assigned by HEDDA.IO is always labelled as “New” in the output and therefore also as valid.

The Algorithm feature enriches data quality by identifying and standardizing spelling errors. There are currently over 15 different phonetic and distance algorithms available for this purpose, which can be selected depending on the type of data, language or input method. For example, the Levenshtein algorithm is very suitable for article numbers, Cologne phonetics for surnames from German-speaking countries or the keyboard distance for input errors from a call centre. If, for example, the value “Deuschlant” is identified during the check, the algorithm assigns it to the synonym “Deutschland”, which results in the valid main value “DE”. A value “Muster”, for example, becomes “Musterland” and is labelled as invalid accordingly.

Maintaining members within HEDDA.IO is easy and intuitive, whether through the user-friendly HEDDA.UI or inserted directly from an Excel sheet via copy & paste. Thanks to the integrated stage, data marked as “New” by the test is written into the internal stage. The Data Steward can repeatedly import this stage into HEDDA.IO as a member and assign the corresponding values to the respective Synonym/Main or Valid/Invalid status. With each execution data can be refined and incorporated, continually improving the knowledge base.

If valid values from e.g. a master data system are available from the beginning, the data can also be maintained externally. For this purpose, an external connection to various systems can be established in HEDDA.IO and the data can be loaded from there. HEDDA.IO currently supports Parquet files from an Azure Data Lake Gen2, Databricks or Microsoft SQL Server, among others.

If data is no longer to be maintained and the current status is to be considered final, the domain can also be set to “Closed”. This means that all values that cannot be assigned to a maintained synonym or main value are automatically set to Invalid.

Harnessing these functionalities, HEDDA.IO emerges as a robust solution for data validation, correction, and standardization. With built-in analysis tools, users gain profound insights into data application of rules and changes, improving informed decision-making within organizational workflows.