SparkSqlHook

Apache Spark

This hook is a wrapper around the spark-sql binary. It requires that the “spark-sql” binary is in the PATH.

Last Updated: Oct. 23, 2022

Install the Apache Spark provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

sqlThe SQL query to execute

confarbitrary Spark configuration property

conn_idconnection_id string

total_executor_cores(Standalone & Mesos only) Total cores for all executors (Default: all the available cores on the worker)

executor_cores(Standalone & YARN only) Number of cores per executor (Default: 2)

executor_memoryMemory per executor (e.g. 1000M, 2G) (Default: 1G)

keytabFull path to the file that contains the keytab

masterspark://host:port, mesos://host:port, yarn, or local (Default: The host and port set in the Connection, or "yarn")

nameName of the job.

num_executorsNumber of executors to launch

verboseWhether to pass the verbose flag to spark-sql

yarn_queueThe YARN queue to submit to (Default: The queue value set in the Connection, or "default")

This hook is a wrapper around the spark-sql binary. It requires that the “spark-sql” binary is in the PATH.