EmrContainerOperator

Amazon

An operator that submits jobs to EMR on EKS virtual clusters.

View on GitHub

Last Updated: Mar. 2, 2023

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

nameRequiredThe name of the job run.
virtual_cluster_idRequiredThe EMR on EKS virtual cluster ID
execution_role_arnRequiredThe IAM role ARN associated with the job run.
release_labelRequiredThe Amazon EMR release version to use for the job run.
job_driverRequiredJob configuration details, e.g. the Spark job parameters.
configuration_overridesThe configuration overrides for the job run, specifically either application configuration or monitoring configuration.
client_request_tokenThe client idempotency token of the job run request. Use this if you want to specify a unique ID to prevent two jobs from getting started. If no token is provided, a UUIDv4 token will be generated for you.
aws_conn_idThe Airflow connection used for AWS credentials.
wait_for_completionWhether or not to wait in the operator for the job to complete.
poll_intervalTime (in seconds) to wait between two consecutive calls to check query status on EMR
max_triesDeprecated - use max_polling_attempts instead.
max_polling_attemptsMaximum number of times to wait for the job run to finish. Defaults to None, which will poll until the job is not in a pending, submitted, or running state.
tagsThe tags assigned to job runs. Defaults to None

Documentation

An operator that submits jobs to EMR on EKS virtual clusters.

See also

For more information on how to use this operator, take a look at the guide: Submit a job to an Amazon EMR virtual cluster

Was this page helpful?