GCSListObjectsOperator

Google

List all objects from the bucket with the given string prefix and delimiter in name.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

bucketRequiredThe Google Cloud Storage bucket to find the objects. (templated)
prefixPrefix string which filters objects whose name begin with this prefix. (templated)
delimiterThe delimiter by which you want to filter the objects. (templated) For e.g to lists the CSV files from in a directory in GCS you would use delimiter=’.csv’.
gcp_conn_id(Optional) The connection ID used to connect to Google Cloud.
delegate_toThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

Documentation

List all objects from the bucket with the given string prefix and delimiter in name.

This operator returns a python list with the name of objects which can be used by XCom in the downstream task.

Example:

The following Operator would list all the Avro files from sales/sales-2017 folder in data bucket.

GCS_Files = GoogleCloudStorageListOperator(
task_id='GCS_Files',
bucket='data',
prefix='sales/sales-2017/',
delimiter='.avro',
gcp_conn_id=google_cloud_conn_id
)

Was this page helpful?