Orchestrate Databricks Jobs with Airflow

Example DAG from the Astronomer Databricks tutorial.

Big Data & Analytics


Providers:

Modules:

Last Updated: Aug. 26, 2021

Run this DAG

1. Install the Astronomer CLI:Skip if you already have our CLI

2. Download the repository:

3. Navigate to where the repository was cloned and start the DAG:

airflow-databricks-tutorial

This repo contains an Astronomer project with multiple example DAGs showing how to use Airflow to orchestrate Databricks jobs. A guide discussing the DAGs and concepts in depth can be found here.

Tutorial Overview

This tutorial has one DAGs showing how to use the following Databricks Operators:

  • DatabricksRunNowOperator
  • DatabricksSubmitRunOperator

Getting Started

The easiest way to run these example DAGs is to use the Astronomer CLI to get an Airflow instance up and running locally:

  1. Install the Astronomer CLI
  2. Clone this repo somewhere locally and navigate to it in your terminal
  3. Initialize an Astronomer project by running astro dev init
  4. Start Airflow locally by running astro dev start
  5. Navigate to localhost:8080 in your browser and you should see the tutorial DAGs there