The WebRunner Azure Function will access a source datalake, load the data from a parquet file, run the data against HEDDA.IO, then save an invalid, valid, corrected, and uncorrected dataset to a destination datalake.
Important! The function currently only reads and outputs parquet files.
- Azure Data Lake Storage
- Gen2 Function App
Azure Data Lake Gen2
If you don’t already have a Azure Data Lake Gen2, follow this tutorial.
If you don’t already have a Function App, follow this tutorial.
Request Body (json)
“SourceStorageFileSystem” : “”,
“SourceParquetFileName” : “”,
“SourceStorageAccountName” : “”,
“SourceStorageAccountKey” : “”,
“DestinationStorageFileSystem” : “”,
“DestinationStorageAccountName” : “”,
“DestinationStorageAccountKey” : “”,
“ProjectName” : “”,
“KnowledgeBaseName” : “”,
“RunName” : “”,
“Mapping” : “”,
“ApiKey” : “”
Every property is required to have a valid value in order for the function to successfully run. Every property in the json object above, takes value of type string.
- SourceStorageFileSystem – The name of the Source bloc container.
- SourceParquetFileName – The name of the file, including the folders it resides in, separated by “/” if it’s the case.
- SourceStorageAccountName – The name of the Source storage account.
- SourceStorageAccountKey – The key to the Source storage account.
- DestinationStorageFileSystem – The name of the Destination bloc container.
- DestinationStorageAccountName – The name of the Destination storage account.
- DestinationStorageAccountKey – The key to the Destination storage account.
- ProjectName – The name of the HEDDA.IO project in which the Knowledge Base is located.
- KnowledgeBaseName – The name of the Knowledge Base you want to run your dataset against.
- RunName – The name of the HEDDA.IO Run which you want to use.
- Mapping – The name of the mapping that you want to use for the Run.
- ApiKey – The key to your HEDDA.IO profile, which provides access to the projects you have
The function workflow is as follows:
- Fetching the input parquet file from the specified source location using ParquetVerleger.
- Runs the data against HEDDA.IO, using some of the properties that are being passed in the Request Body, as arguments, and store the result in a variable called heddaResult, of type Hedda.DTO.Workflow.
- Using DataWraper functions, it will filter the heddaResult into four DataTables: valid, invalid, corrected, and uncorrected.
- Save each of the DataTables as parquet files to the specified destination location using ParquetVerleger.
This function will output four datasets, each of them being the differently filtered dataset that is the result of the HEDDA.IO run.
Here are the four datasets that will be saved as parquet files by the function:
- Valid – data that has successfully passed the validation process.
- Invalid – data that has failed to pass the validation process.
- Corrected – data that was corrected according to domain member’s configuration or business rule actions.
- Uncorrected/Unchanged – data that was left untouched.
The datalake where these files will be savedat, must be specified in the Request Body. Each of them will be saved under the respective Project folder, then Knowledge Base folder, Run folder, and finally the ExecutionID + current date folder, in this exact order.
Execution ID is a unique identifier that represents the current Run of the function execution.
Let’s say we have this Request Body that we will be sending to the function.
“SourceStorageFileSystem” : “data/”,
“SourceParquetFileName” : “USPopulation/UsPopulationByCounty.parquet”,
“SourceStorageAccountName” : “heddatestaccount1”,
“SourceStorageAccountKey” : “heddatestaccountkey”,
“DestinationStorageFileSystem” : “heddaresult/”,
“DestinationStorageAccountName” : “heddatestaccount2”,
“DestinationStorageAccountKey” : “heddatestaccount2key”,
“ProjectName” : “MyProject”,
“KnowledgeBaseName” : “MyKnowledgeBase”,
“RunName” : “DotNetRun”,
“Mapping” : “FullMapping”,
“ApiKey” : “heddaApiKey”
The Filepath at which the files will be saved will be:
- Valid data: /MyProject/MyKnowledgeBase/DotNetRun/ExecutionID_current-date/ValidData.parquet_
- Invalid data : /MyProject/MyKnowledgeBase/DotNetRun/ExecutionID_current-date/InvalidData.parquet_
- Corrected data : /MyProject/MyKnowledgeBase/DotNetRun/ExecutionID_current- date/CorrectedData.parquet_
- Uncorrected data : /MyProject/MyKnowledgeBase/DotNetRun/ExecutionID_current- date/UncorrectedData.parquet_