MsSqlToHiveOperator

Apache Hive

Moves data from Microsoft SQL Server to Hive. The operator runs your query against Microsoft SQL Server, stores the file locally before loading it into a Hive table. If the create or recreate arguments are set to True, a CREATE TABLE and DROP TABLE statements are generated. Hive data types are inferred from the cursor’s metadata. Note that the table generated in Hive uses STORED AS textfile which isn’t the most efficient serialization format. If a large amount of data is loaded and/or if the table gets queried considerably, you may want to use this operator only to stage the data into a temporary table before loading it into its final destination using a HiveOperator.

View on GitHub

Last Updated: Mar. 21, 2023

Access Instructions

Install the Apache Hive provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

sqlRequiredSQL query to execute against the Microsoft SQL Server database. (templated)
hive_tableRequiredtarget Hive table, use dot notation to target a specific database. (templated)
createwhether to create the table if it doesn’t exist
recreatewhether to drop and recreate the table at every execution
partitiontarget partition as a dict of partition columns and values. (templated)
delimiterfield delimiter in the file
mssql_conn_idsource Microsoft SQL Server connection
hive_cli_conn_idReference to the Hive CLI connection id.
tblpropertiesTBLPROPERTIES of the hive table being created

Documentation

Moves data from Microsoft SQL Server to Hive. The operator runs your query against Microsoft SQL Server, stores the file locally before loading it into a Hive table. If the create or recreate arguments are set to True, a CREATE TABLE and DROP TABLE statements are generated. Hive data types are inferred from the cursor’s metadata. Note that the table generated in Hive uses STORED AS textfile which isn’t the most efficient serialization format. If a large amount of data is loaded and/or if the table gets queried considerably, you may want to use this operator only to stage the data into a temporary table before loading it into its final destination using a HiveOperator.

Was this page helpful?