Orchestrating Multiple Azure Data Factory Pipelines in Airflow
This DAG demonstrates orchestrating multiple Azure Data Factory (ADF) pipelines using Airflow to perform classic ELT operations.
StorageDatabasesOrchestrationETL/ELTAirflow Fundamentals
Providers:
Run this DAG
1. Install the Astronomer CLI:Skip if you already have our CLI
2. Download the repository:
3. Navigate to where the repository was cloned and start the DAG:
Orchestrating Azure Data Factory Pipelines in Airflow
This DAG demonstrates orchestrating multiple Azure Data Factory (ADF) pipelines using Airflow to perform classic ELT operations. These ADF pipelines extract daily, currency exchange-rates from an API, persist data to a data lake in Azure Blob Storage, perform data-quality checks on staged data, and finally load to a daily aggregate table with SCD, Type-2 logic in Azure SQL Database. There are two ADF pipelines, extractDailyExchangeRates
and loadDailyExchangeRates
, which perform the ELT.
The extractDailyExchangeRates
ADF pipeline will extract the data from the open Exchange Rate API for the USD and EUR currencies and initially store the response data in a "landing" container within Azure Blob Storage, then copy the extracted data to a "data-lake" container, load the landed data to a staging table in Azure SQL Database via a T-SQL stored procedure, and finally delete the landed data file.
The loadDailyExchangeRates
ADF pipeline performs a data quality check against the ingested currency codes relative to a dimensional, reference dataset. If the data quality check passes, another T-SQL stored procedure will insert the data into a daily, aggregate table of exchange rates comparing the US dollar, Euro, Japanese Yen, and the Swiss Franc.
Airflow Version
2.2.0
Providers
apache-airflow-providers-microsoft-azure==3.2.0
Connections Required
- Azure Data Factory
For more information on how to connect to Azure Data Factory from Airflow, see this guide.