GlueJobOperator

Amazon

Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala

View on GitHub

Last Updated: Mar. 21, 2023

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

job_nameunique job name per AWS Account
script_locationlocation of ETL script. Must be a local or S3 path
job_descjob description details
concurrent_run_limitThe maximum number of concurrent runs allowed for a job
script_argsetl script arguments and AWS Glue arguments (templated)
retry_limitThe maximum number of times to retry this job if it fails
num_of_dpusNumber of AWS Glue DPUs to allocate to this Job.
region_nameaws region name (example: us-east-1)
s3_bucketS3 bucket where logs and local etl script will be uploaded
iam_role_nameAWS IAM Role for Glue Job Execution
create_job_kwargsExtra arguments for Glue Job Creation
run_job_kwargsExtra arguments for Glue Job Run
wait_for_completionWhether or not wait for job run completion. (default: True)
verboseIf True, Glue Job Run logs show in the Airflow Task Logs. (default: False)

Documentation

Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala

See also

For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job

Was this page helpful?