DatabricksCopyIntoOperator

Databricks

Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. COPY INTO command is constructed from individual pieces, that are described in documentation.

View on GitHub

Last Updated: Nov. 26, 2022

Access Instructions

Install the Databricks provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

table_nameRequiredRequired name of the table. (templated)
file_locationRequiredRequired location of files to import. (templated)
file_formatRequiredRequired file format. Supported formats are CSV, JSON, AVRO, ORC, PARQUET, TEXT, BINARYFILE.
databricks_conn_idReference to Databricks connection id (templated)
http_pathOptional string specifying HTTP path of Databricks SQL Endpoint or cluster. If not specified, it should be either specified in the Databricks connection’s extra parameters, or sql_endpoint_name must be specified.
sql_endpoint_nameOptional name of Databricks SQL Endpoint. If not specified, http_path must be provided as described above.
session_configurationAn optional dictionary of Spark session parameters. Defaults to None. If not specified, it could be specified in the Databricks connection’s extra parameters.
http_headersAn optional list of (k, v) pairs that will be set as HTTP headers on every request
catalogAn optional initial catalog to use. Requires DBR version 9.0+
schemaAn optional initial schema to use. Requires DBR version 9.0+
client_parametersAdditional parameters internal to Databricks SQL Connector parameters
filesoptional list of files to import. Can’t be specified together with pattern. (templated)
patternoptional regex string to match file names to import. Can’t be specified together with files.
expression_listoptional string that will be used in the SELECT expression.
credentialoptional credential configuration for authentication against a source location.
storage_credentialoptional Unity Catalog storage credential for destination.
encryptionoptional encryption configuration for a specified location.
format_optionsoptional dictionary with options specific for a given file format.
force_copyoptional bool to control forcing of data import (could be also specified in copy_options).
validateoptional configuration for schema & data validation. True forces validation of all rows, integer number - validate only N first rows
copy_optionsoptional dictionary of copy options. Right now only force option is supported.

Documentation

Executes COPY INTO command in a Databricks SQL endpoint or a Databricks cluster. COPY INTO command is constructed from individual pieces, that are described in documentation.

See also

For more information on how to use this operator, take a look at the guide: DatabricksCopyIntoOperator

Was this page helpful?