EmrCreateJobFlowOperator
AmazonCreates an EMR JobFlow, reading the config from the EMR connection. A dictionary of JobFlow overrides can be passed that override the config from the connection.
Access Instructions
Install the Amazon provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
aws_conn_idThe Airflow connection used for AWS credentials. If this is None or empty then the default boto3 behaviour is used. If running Airflow in a distributed manner and aws_conn_id is None or empty, then default boto3 configuration would be used (and must be maintained on each worker node)
emr_conn_idAmazon Elastic MapReduce Connection. Use to receive an initial Amazon EMR cluster configuration: boto3.client('emr').run_job_flow request body. If this is None or empty or the connection does not exist, then an empty initial configuration is used.
job_flow_overridesboto3 style arguments or reference to an arguments file (must be ‘.json’) to override specific emr_conn_id extra parameters. (templated)
region_nameRegion named passed to EmrHook
wait_for_completionWhether to finish task immediately after creation (False) or wait for jobflow completion (True)
waiter_max_attemptsMaximum number of tries before failing.
waiter_delayNumber of seconds between polling the state of the notebook.
waiter_countdownMax. seconds to wait for jobflow completion (only in combination with wait_for_completion=True, None = no limit) (Deprecated. Please use waiter_max_attempts.)
waiter_check_interval_secondsNumber of seconds between polling the jobflow state. Defaults to 60 seconds. (Deprecated. Please use waiter_delay.)
Documentation
Creates an EMR JobFlow, reading the config from the EMR connection. A dictionary of JobFlow overrides can be passed that override the config from the connection.
See also
For more information on how to use this operator, take a look at the guide: Create an EMR job flow