SparkSqlOperator

Apache SparkCertified

Execute Spark SQL query

View on GitHub

Last Updated: Oct. 23, 2022

Access Instructions

Install the Apache Spark provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

sqlRequiredThe SQL query to execute. (templated)
confarbitrary Spark configuration property
conn_idconnection_id string
total_executor_cores(Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)
executor_cores(Standalone & YARN only) Number of cores per executor (Default: 2)
executor_memoryMemory per executor (e.g. 1000M, 2G) (Default: 1G)
keytabFull path to the file that contains the keytab
masterspark://host:port, mesos://host:port, yarn, or local (Default: The host and port set in the Connection, or "yarn")
nameName of the job
num_executorsNumber of executors to launch
verboseWhether to pass the verbose flag to spark-sql
yarn_queueThe YARN queue to submit to (Default: The queue value set in the Connection, or "default")

Documentation

Execute Spark SQL query

See also

For more information on how to use this operator, take a look at the guide: SparkSqlOperator

Was this page helpful?