BigQueryCreateEmptyDatasetOperator

Google

This operator is used to create new dataset for your Project in BigQuery. https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource

View on GitHub

Last Updated: Mar. 16, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

project_idThe name of the project where we want to create the dataset.
dataset_idThe id of dataset. Don’t need to provide, if datasetId in dataset_reference.
locationThe geographic location where the dataset should reside.
dataset_referenceDataset reference that could be provided with request body. More info: https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource
gcp_conn_id(Optional) The connection ID used to connect to Google Cloud.
delegate_toThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
if_existsWhat should Airflow do if the dataset exists. If set to log, the TI will be passed to success and an error message will be logged. Set to ignore to ignore the error, set to fail to fail the TI, and set to skip to skip it. Example: create_new_dataset = BigQueryCreateEmptyDatasetOperator( dataset_id='new-dataset', project_id='my-project', dataset_reference={"friendlyName": "New Dataset"} gcp_conn_id='_my_gcp_conn_', task_id='newDatasetCreator', dag=dag)
exists_okDeprecated - use if_exists=”ignore” instead.

Documentation

This operator is used to create new dataset for your Project in BigQuery. https://cloud.google.com/bigquery/docs/reference/rest/v2/datasets#resource

See also

For more information on how to use this operator, take a look at the guide: Create dataset

Was this page helpful?