GCSSynchronizeBucketsOperator

Google

Synchronizes the contents of the buckets or bucket’s directories in the Google Cloud Services.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

source_bucketRequiredThe name of the bucket containing the source objects.
destination_bucketRequiredThe name of the bucket containing the destination objects.
source_objectThe root sync directory in the source bucket.
destination_objectThe root sync directory in the destination bucket.
recursiveIf True, subdirectories will be considered
allow_overwriteif True, the files will be overwritten if a mismatched file is found. By default, overwriting files is not allowed
delete_extra_filesif True, deletes additional files from the source that not found in the destination. By default extra files are not deleted. Note This option can delete data quickly if you specify the wrong source/destination combination.
gcp_conn_id(Optional) The connection ID used to connect to Google Cloud.
delegate_toThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

Synchronizes the contents of the buckets or bucket’s directories in the Google Cloud Services.

Parameters source_object and destination_object describe the root sync directory. If they are not passed, the entire bucket will be synchronized. They should point to directories.

Note

The synchronization of individual files is not supported. Only entire directories can be synchronized.

See also

For more information on how to use this operator, take a look at the guide: GCSSynchronizeBuckets

Was this page helpful?