Operator meant to move data from mongo via pymongo to s3 via boto.

View on GitHub

Last Updated: Oct. 23, 2022

Access Instructions

Install the Amazon provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.


mongo_conn_idreference to a specific mongo connection
aws_conn_idreference to a specific S3 connection
mongo_collectionRequiredreference to a specific collection in your mongo db
mongo_queryRequiredquery to execute. A list including a dict of the query
mongo_projectionoptional parameter to filter the returned fields by the query. It can be a list of fields names to include or a dictionary for excluding fields (e.g projection={"_id": 0} )
s3_bucketRequiredreference to a specific S3 bucket to store the data
s3_keyRequiredin which S3 key the file will be stored
mongo_dbreference to a specific mongo database
replacewhether or not to replace the file in S3 if it previously existed
allow_disk_useenables writing to temporary files in the case you are handling large dataset. This only takes effect when mongo_query is a list - running an aggregate pipeline
compressiontype of compression to use for output file in S3. Currently only gzip is supported.


Operator meant to move data from mongo via pymongo to s3 via boto.

See also

For more information on how to use this operator, take a look at the guide: MongoDB To Amazon S3 transfer operator

Example DAGs

Was this page helpful?