S3ToHiveOperator
Apache HiveMoves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table. If the create
or recreate
arguments are set to True
, a CREATE TABLE
and DROP TABLE
statements are generated. Hive data types are inferred from the cursor’s metadata from.
Access Instructions
Install the Apache Hive provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
Moves data from S3 to Hive. The operator downloads a file from S3, stores the file locally before loading it into a Hive table. If the create
or recreate
arguments are set to True
, a CREATE TABLE
and DROP TABLE
statements are generated. Hive data types are inferred from the cursor’s metadata from.
Note that the table generated in Hive uses STORED AS textfile
which isn’t the most efficient serialization format. If a large amount of data is loaded and/or if the tables gets queried considerably, you may want to use this operator only to stage the data into a temporary table before loading it into its final destination using a HiveOperator
.