GlueJobOperator
AmazonCreates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala
Access Instructions
Install the Amazon provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
job_nameunique job name per AWS Account
script_locationlocation of ETL script. Must be a local or S3 path
job_descjob description details
concurrent_run_limitThe maximum number of concurrent runs allowed for a job
script_argsetl script arguments and AWS Glue arguments (templated)
retry_limitThe maximum number of times to retry this job if it fails
num_of_dpusNumber of AWS Glue DPUs to allocate to this Job.
region_nameaws region name (example: us-east-1)
s3_bucketS3 bucket where logs and local etl script will be uploaded
iam_role_nameAWS IAM Role for Glue Job Execution
create_job_kwargsExtra arguments for Glue Job Creation
run_job_kwargsExtra arguments for Glue Job Run
wait_for_completionWhether or not wait for job run completion. (default: True)
verboseIf True, Glue Job Run logs show in the Airflow Task Logs. (default: False)
Documentation
Creates an AWS Glue Job. AWS Glue is a serverless Spark ETL service for running Spark Jobs on the AWS cloud. Language support: Python and Scala
See also
For more information on how to use this operator, take a look at the guide: Submit an AWS Glue job