S3Hook

Amazon

Interact with Amazon Simple Storage Service (S3). Provide thick wrapper around boto3.client("s3") and boto3.resource("s3").

View on GitHub

Last Updated: Jan. 24, 2023

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

transfer_config_argsConfiguration object for managed S3 transfers.
extra_argsExtra arguments that may be passed to the download/upload operations.

Documentation

Interact with Amazon Simple Storage Service (S3). Provide thick wrapper around boto3.client("s3") and boto3.resource("s3").

See also

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/customizations/s3.html#s3-transfers

  • For allowed upload extra arguments see boto3.s3.transfer.S3Transfer.ALLOWED_UPLOAD_ARGS.

  • For allowed download extra arguments see boto3.s3.transfer.S3Transfer.ALLOWED_DOWNLOAD_ARGS.

Additional arguments (such as aws_conn_id) may be specified and are passed down to the underlying AwsBaseHook.

Example DAGs

Machine Learning Pipelines with AWS SageMaker

This DAG shows an example implementation of machine learning model orchestration using Airflow and AWS SageMaker.

Amazon
AI + Machine Learning
Executing Predictions with AWS SageMaker

This DAG shows an example implementation of executing predictions from a machine learning model using AWS SageMaker.

Amazon
AI + Machine LearningData Science
Extract Zendesk to Snowflake

Upload the following Zendesk objects to S3: Tickets, Organizations, Users. From S3, loads into Snowflake. Loads can be daily or full-extracts.

SnowflakeAmazonApache Airflow HTTP
ETL/ELTStorageDatabases
Validate Files Uploaded to S3

This is a very simple DAG showing a minimal EL data pipeline with a data integrity check. A single file is uploaded to S3, then its ETag is verified against the MD5 hash of the local file.

Apache Airflow Amazon
Data Management & GovernanceETL/ELT
Perform Data Integrity Checks from S3 to Redshift

This is the second in a series of DAGs showing an EL pipeline with data integrity checking of data in S3 as well as Redshift.

Apache Airflow AmazonPostgres
Data Management & GovernanceETL/ELTDatabases
Advanced Data Integrity Checks from S3 to Redshift

This is the third in a series of DAGs showing an EL pipeline with data integrity and data quality checking for data in S3 and Redshift using ETag verification and row-based data quality checks where t…

Apache Airflow AmazonPostgres
Data Management & GovernanceETL/ELTDatabases
Move Files in S3 with Dynamic Task Mapping

This DAG shows an example implementation of sorting files in an S3 bucket into two different buckets based on logic involving the content of the files using dynamic task mapping with the expand_kwargs…

Amazon
Airflow FundamentalsETL/ELT
Datasets ML Example Publish

This DAG publishes a dataset that is used by a separate consumer DAG to execute predictions from a machine learning model using AWS SageMaker.

Amazon
Airflow FundamentalsETL/ELTAI + Machine Learning
Crate Data Quality Checks DAG

Imports local files to S3, then to CrateDB and checks several data quality properties

PostgresAmazonSlackCommon SQL
ETL/ELTData QualityDatabases

Was this page helpful?