LivyOperatorAsync

Astronomer ProvidersCertified

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster asynchronously.

Access Instructions

Install the Astronomer Providers provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

filepath of the file containing the application to execute (required).
class_namename of the application Java/Spark main class.
argsapplication command line arguments.
jarsjars to be used in this sessions.
py_filespython files to be used in this session.
filesfiles to be used in this session.
driver_memoryamount of memory to use for the driver process.
driver_coresnumber of cores to use for the driver process.
executor_memoryamount of memory to use per executor process.
executor_coresnumber of cores to use for each executor.
num_executorsnumber of executors to launch for this session.
archivesarchives to be used in this session.
queuename of the YARN queue to which the application is submitted.
namename of this session.
confSpark configuration properties.
proxy_useruser to impersonate when running the job.
livy_conn_idreference to a pre-defined Livy Connection.
polling_intervaltime in seconds between polling for job completion. If poll_interval=0, in that case return the batch_id and if polling_interval > 0, poll the livy job for termination in the polling interval defined.
extra_optionsAdditional option can be passed when creating a request. For example, run(json=obj) is passed as aiohttp.ClientSession().get(json=obj)
extra_headersA dictionary of headers passed to the HTTP request to livy.
retry_argsArguments which define the retry behaviour. See Tenacity documentation at https://github.com/jd/tenacity

Documentation

This operator wraps the Apache Livy batch REST API, allowing to submit a Spark application to the underlying cluster asynchronously.

Was this page helpful?