MetastorePartitionSensor

Apache Hive

An alternative to the HivePartitionSensor that talk directly to the MySQL db. This was created as a result of observing sub optimal queries generated by the Metastore thrift service when hitting subpartitioned tables. The Thrift service’s queries were written in a way that would not leverage the indexes.

View on GitHub

Last Updated: Dec. 27, 2022

Access Instructions

Install the Apache Hive provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

schemathe schema
tableRequiredthe table
partition_nameRequiredthe partition name, as defined in the PARTITIONS table of the Metastore. Order of the fields does matter. Examples: ds=2016-01-01 or ds=2016-01-01/sub=foo for a sub partitioned table
mysql_conn_ida reference to the MySQL conn_id for the metastore

Documentation

An alternative to the HivePartitionSensor that talk directly to the MySQL db. This was created as a result of observing sub optimal queries generated by the Metastore thrift service when hitting subpartitioned tables. The Thrift service’s queries were written in a way that would not leverage the indexes.

Was this page helpful?