SqoopOperator

Apache Sqoop

Execute a Sqoop job. Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

View on GitHub

Last Updated: Feb. 16, 2023

Access Instructions

Install the Apache Sqoop provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

conn_idstr
cmd_typestr specify command to execute “export” or “import”
schemaSchema name
tableTable to read
queryImport result of arbitrary SQL query. Instead of using the table, columns and where arguments, you can specify a SQL statement with the query argument. Must also specify a destination directory with target_dir.
target_dirHDFS destination directory where the data from the rdbms will be written
appendAppend data to an existing dataset in HDFS
file_type“avro”, “sequence”, “text” Imports data to into the specified format. Defaults to text.
columns Columns to import from table
num_mappersUse n mapper tasks to import/export in parallel
split_byColumn of the table used to split work units
whereWHERE clause to use during import
export_dirHDFS Hive database directory to export to the rdbms
input_null_stringThe string to be interpreted as null for string columns
input_null_non_stringThe string to be interpreted as null for non-string columns
staging_tableThe table in which data will be staged before being inserted into the destination table
clear_staging_tableIndicate that any data present in the staging table can be deleted
enclosed_bySets a required field enclosing character
escaped_bySets the escape character
input_fields_terminated_bySets the input field separator
input_lines_terminated_bySets the input end-of-line character
input_optionally_enclosed_bySets a field enclosing character
batchUse batch mode for underlying statement execution
directUse direct export fast path
driverManually specify JDBC driver class to use
verboseSwitch to more verbose logging for debug purposes
relaxed_isolationuse read uncommitted isolation level
hcatalog_databaseSpecifies the database name for the HCatalog table
hcatalog_tableThe argument value for this option is the HCatalog table
create_hcatalog_tableHave sqoop create the hcatalog table passed in or not
propertiesadditional JVM properties passed to sqoop
extra_import_optionsExtra import options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.
extra_export_optionsExtra export options to pass as dict. If a key doesn’t have a value, just pass an empty string to it. Don’t include prefix of – for sqoop options.
libjarsOptional Comma separated jar files to include in the classpath.

Documentation

Execute a Sqoop job. Documentation for Apache Sqoop can be found here: https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html

Was this page helpful?