Sends lineage data from tasks to DataHub.

View on GitHub

Last Updated: Dec. 2, 2022

Access Instructions

Install the DataHub provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.


Sends lineage data from tasks to DataHub.

Configurable via airflow.cfg as follows:

# For REST-based:
airflow connections add --conn-type 'datahub_rest' 'datahub_rest_default' --conn-host 'http://localhost:8080'
# For Kafka-based (standard Kafka sink config can be passed via extras):
airflow connections add --conn-type 'datahub_kafka' 'datahub_kafka_default' --conn-host 'broker:9092' --conn-extra '{}'
backend = datahub_provider.lineage.datahub.DatahubLineageBackend
datahub_kwargs = {
"datahub_conn_id": "datahub_rest_default",
"capture_ownership_info": true,
"capture_tags_info": true,
"graceful_exceptions": true }
# The above indentation is important!

Was this page helpful?