ElasticsearchTaskHandler

Elasticsearch

ElasticsearchTaskHandler is a python log handler that reads logs from Elasticsearch. Note that Airflow does not handle the indexing of logs into Elasticsearch. Instead, Airflow flushes logs into local files. Additional software setup is required to index the logs into Elasticsearch, such as using Filebeat and Logstash. To efficiently query and sort Elasticsearch results, this handler assumes each log message has a field log_id consists of ti primary keys: log_id = {dag_id}-{task_id}-{execution_date}-{try_number} Log messages with specific log_id are sorted based on offset, which is a unique integer indicates log message’s order. Timestamps here are unreliable because multiple log messages might have the same timestamp.

View on GitHub

Last Updated: Feb. 12, 2023

Access Instructions

Install the Elasticsearch provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

base_log_folderBase log folder to place logs.
filename_templatetemplate filename string

Documentation

ElasticsearchTaskHandler is a python log handler that reads logs from Elasticsearch. Note that Airflow does not handle the indexing of logs into Elasticsearch. Instead, Airflow flushes logs into local files. Additional software setup is required to index the logs into Elasticsearch, such as using Filebeat and Logstash. To efficiently query and sort Elasticsearch results, this handler assumes each log message has a field log_id consists of ti primary keys: log_id = {dag_id}-{task_id}-{execution_date}-{try_number} Log messages with specific log_id are sorted based on offset, which is a unique integer indicates log message’s order. Timestamps here are unreliable because multiple log messages might have the same timestamp.

Was this page helpful?