LivyOperator

Apache Livy

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

View on GitHub

Last Updated: Feb. 22, 2023

Access Instructions

Install the Apache Livy provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

fileRequiredpath of the file containing the application to execute (required). (templated)
class_namename of the application Java/Spark main class. (templated)
argsapplication command line arguments. (templated)
jarsjars to be used in this sessions. (templated)
py_filespython files to be used in this session. (templated)
filesfiles to be used in this session. (templated)
driver_memoryamount of memory to use for the driver process. (templated)
driver_coresnumber of cores to use for the driver process. (templated)
executor_memoryamount of memory to use per executor process. (templated)
executor_coresnumber of cores to use for each executor. (templated)
num_executorsnumber of executors to launch for this session. (templated)
archivesarchives to be used in this session. (templated)
queuename of the YARN queue to which the application is submitted. (templated)
namename of this session. (templated)
confSpark configuration properties. (templated)
proxy_useruser to impersonate when running the job. (templated)
livy_conn_idreference to a pre-defined Livy Connection.
livy_conn_auth_typeThe auth type for the Livy Connection.
polling_intervaltime in seconds between polling for job completion. Don’t poll for values >=0
extra_optionsA dictionary of options, where key is string and value depends on the option that’s being modified.
extra_headersA dictionary of headers passed to the HTTP request to livy.
retry_argsArguments which define the retry behaviour.
deferrableRun operator in the deferrable mode See Tenacity documentation at https://github.com/jd/tenacity

Documentation

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster.

Was this page helpful?