Dynamic dbt Data Pipeline
An example of a dbt pipeline which generates tasks dynamically from a manifest.json
file.
Data ProcessingETL/ELT
Providers:
Modules:
Run this DAG
1. Install the Astronomer CLI:Skip if you already have our CLI
2. Download the repository:
3. Navigate to where the repository was cloned and start the DAG:
Airflow DAGs for dbt
The code in this repository is meant to accompany this blog post on beginner and advanced implementation concepts at the intersection of dbt and Airflow.
To run these DAGs locally:
- Download the Astro CLI
- Download and run Docker
- Clone this repository and
cd
into it. - Run
astro dev start
to spin up a local Airflow environment and run the accompanying DAGs on your machine.
dbt project setup
We are currently using the jaffle_shop sample dbt project.
The only files required for the Airflow DAGs to run are dbt_project.yml
, profiles.yml
and target/manifest.json
, but we included the models for completeness. If you would like to try these DAGs with your own dbt workflow, feel free to drop in your own project files.
Notes
- To use these DAGs, Airflow 2.2+ is required. These DAGs have been tested with Airflow 2.2.0.
- If you make changes to the dbt project, you will need to run
dbt compile
in order to update themanifest.json
file.
This may be done manually during development, as part of a CI/CD pipeline, or as a separate step in a production pipeline run before the Airflow DAG is triggered.
- The sample dbt project contains the
profiles.yml
, which is configured to use environment variables. The
database credentials from an Airflow connection are passed as environment variables to the BashOperator
tasks running the dbt commands.
- Each DAG runs a
dbt_seed
task at the beginning that loads sample data into the database. This is simply for the purpose of this demo.