CloudDataTransferServiceS3ToGCSOperator

Google

Synchronizes an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

s3_bucketRequiredThe S3 bucket where to find the objects. (templated)
gcs_bucketRequiredThe destination Google Cloud Storage bucket where you want to store the files. (templated)
s3_pathOptional root path where the source objects are. (templated)
gcs_pathOptional root path for transferred objects. (templated)
project_idOptional ID of the Google Cloud Console project that owns the job
aws_conn_idThe source S3 connection
gcp_conn_idThe destination connection ID to use when connecting to Google Cloud Storage.
delegate_toGoogle account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
descriptionOptional transfer service job description
scheduleOptional transfer service schedule; If not set, run transfer job once as soon as the operator runs The format is described https://cloud.google.com/storage-transfer/docs/reference/rest/v1/transferJobs. With two additional improvements: dates they can be passed as datetime.date times they can be passed as datetime.time
object_conditionsOptional transfer service object conditions; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
transfer_optionsOptional transfer service transfer options; see https://cloud.google.com/storage-transfer/docs/reference/rest/v1/TransferSpec
waitWait for transfer to finish. It must be set to True, if ‘delete_job_after_completion’ is set to True.
timeoutTime to wait for the operation to end in seconds. Defaults to 60 seconds if not specified.
google_impersonation_chainOptional Google service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
delete_job_after_completionIf True, delete the job after complete. If set to True, ‘wait’ must be set to True.

Documentation

Synchronizes an S3 bucket with a Google Cloud Storage bucket using the Google Cloud Storage Transfer Service.

Warning

This operator is NOT idempotent. If you run it many times, many transfer jobs will be created in the Google Cloud.

Example:

s3_to_gcs_transfer_op = S3ToGoogleCloudStorageTransferOperator(
task_id="s3_to_gcs_transfer_example",
s3_bucket="my-s3-bucket",
project_id="my-gcp-project",
gcs_bucket="my-gcs-bucket",
dag=my_dag,
)

Was this page helpful?