In recent years, notebooks have become the go-to standard for development environments in the field of data engineering and data science. These are applications based on web technologies that allow developers to perform live coding, visualization, and documentation in a single environment. Especially in the case of strongly data-driven activities, this can be the decisive factor for focusing on the essentials and achieving success.
Versatile application possibilities
Notebooks cover numerous practical areas of application such as:
- data cleansing and data transformation
- numerical simulation
- statistical modelling
- data visualization
- machine learning
and much more.
In addition, depending on the environment, notebooks support various popular programming languages such as Python, Scala, .NET, SQL, etc. This makes them versatile and usable for a wide range of purposes by data engineers and data scientists.
Development and execution in different environments
Since notebooks themselves only provide the necessary environment for the development, the execution of the written code is possible on various other environments. For example, the development environment tool Visual Code offers the possibility to execute code directly on one’s own computer. Spark-based systems such as Databricks or Azure Synapse Analytics, on the other hand, enable the user to massively parallelize the written applications via a so-called Spark Cluster and thus gain access to hundreds of cores and terrabytes of memory.
Two different runners
HEDDA.IO currently provides two different HEDDA.IO runtimes, so-called runners, which can be used for the environments .NET Interactive and pyspark. The big advantage is that complex, predefined Data Quality business rules can be applied to the data with just a few lines of code, allowing it to be quickly profiled, validated, and cleansed.
The greatest strength of HEDDA.IO? Unlike many other Data Quality environments, the rules are executed directly on the respective systems and not in the HEDDA.IO environment. This means: The rules are not brought to the data for validation and transformation, but rather the data to the application!
With this innovative approach, HEDDA.IO users not only benefit from the greatest possible flexibility, but can work efficiently and effectively in a familiar environment where they are most productive.
HEDDA.IO runners at a glance
Therefore, HEDDA.IO is an exciting new opportunity for data engineers and data scientists to optimize their own working conditions and significantly increase the quality of their output.