Published On: November 2nd, 2023Categories: Blog

HEDDA.IO WebRunner – Accessing the Best Data Quality Service from Anywhere

The HEDDA.IO WebRunner is a online service designed to streamline the execution of HEDDA.IO data quality processes. With the WebRunner, users can access HEDDA.IO’s advanced data quality features from any location with an internet connection.

One of the most significant advantages of the WebRunner is its seamless integration with existing data pipelines. This feature enables HEDDA.IO to be effortlessly incorporated into a range of different workflows, regardless of whether they are hosted on Azure Data Factory, Azure Synapse Analytics, Databricks or other HTTP request-compatible services.

Spark Integration

One of the standout features of HEDDA.IO WebRunner is its seamless integration with Spark, specifically its integrated support for Spark on Azure Synapse Analytics Spark Pools. This integration unlocks a range of benefits for executing data quality processes, leading to enhanced speed and efficiency in data processing.

By configuring HEDDA.IO WebRunner to utilize Spark Jobs on Azure Synapse Analytics Spark Pools, we can tap into the power of distributed processing and leverage the scalability of Spark. This means that even when dealing with vast volumes of data, Spark Jobs can efficiently handle the workload by distributing the processing across multiple nodes within a cluster. The ability to scale up Spark Jobs ensures that data quality processes can be performed in a timely manner, even when dealing with large and complex datasets. This scalability is particularly advantageous when faced with data-intensive tasks, as it allows for parallel processing, reducing the overall processing time and improving overall performance.

Moreover, leveraging Spark for data quality processes on Azure Synapse Analytics Spark Pools brings additional advantages. It enables seamless integration with other Spark-based tools and libraries, providing access to a rich ecosystem of data processing capabilities. This allows for advanced transformations, aggregations, and analytics to be applied to the data, further enhancing the insights derived from the data quality process.

Use Case

Automate Data Quality with HEDDA.IO WebRunner by Triggering it on New File Upload to Storage Account

We can leverage the powerful capabilities of HEDDA.IO WebRunner by ensuring it is triggered whenever a new file is uploaded to our Storage Account. This automation eliminates the need for manual intervention and enables efficient data processing.

HEDDA.IO WebRunner offers various approaches to achieve this seamless integration, with Event Grid serving as the underlying mechanism. By utilizing Event Grid, we can establish a reliable connection between our Azure Storage Account and HEDDA.IO WebRunner, allowing for immediate execution whenever new raw data is detected.

Let’s explore the available methods for accomplishing this integration:

  • Logic App Execution – One option is to incorporate HEDDA.IO WebRunner within a Logic App. Within this Logic App, we can include additional activities that align with our specific requirements. For instance, we can configure the Logic App to notify our team about the event triggered by the new file upload. By setting up the Logic App to be triggered when a new blob is created in our Azure Storage Account, we ensure that HEDDA.IO WebRunner is automatically invoked, initiating the Data Quality process.
  • Azure Data Factory Pipeline Execution – Alternatively, we can seamlessly integrate HEDDA.IO WebRunner into our Azure Data Factory pipeline. By leveraging Event Grid in this context, we can trigger the execution of the pipeline whenever the predefined conditions, such as a new file upload to the Storage Account, are met. This enables the seamless integration of HEDDA.IO WebRunner within our existing data processing workflows, ensuring consistent and automated Data Quality checks.
  • Direct WebRunner Trigger – Lastly, if we prefer a direct approach, we can leverage the integrated support of Event Grid to trigger HEDDA.IO WebRunner. By creating the necessary configuration within Event Grid, we can establish a direct connection between the file upload event and the execution of HEDDA.IO WebRunner. This method provides a streamlined and simplified setup, ensuring that the Data Quality process initiates promptly upon file upload.

Conclusion

By adopting any of these approaches, we can harness the power of HEDDA.IO WebRunner and Event Grid to automate our Data Quality process effectively, ensuring that our data remains accurate, consistent, and reliable throughout its lifecycle.

HEDDA.IO WebRunner’s integrated support for Spark, particularly when utilized on Azure Synapse Analytics Spark Pools, brings significant benefits to data quality processes. The use of Spark Jobs enables faster and more efficient data processing, the ability to scale up to handle large datasets, and access to a comprehensive set of data processing capabilities.

WE CREATE

CLEAN DATA EVERY DAY.

GET STARTED