CloudDataFusionStartPipelineOperator

Google

Starts a Cloud Data Fusion pipeline. Works for both batch and stream pipelines.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

Documentation

Connections

Parameters

pipeline_nameRequiredYour pipeline name.

instance_nameRequiredThe name of the instance.

success_statesIf provided the operator will wait for pipeline to be in one of the provided states.

pipeline_timeoutHow long (in seconds) operator should wait for the pipeline to be in one of success_states. Works only if success_states are provided.

locationRequiredThe Cloud Data Fusion location in which to handle the request.

runtime_argsOptional runtime args to be passed to the pipeline

namespaceIf your pipeline belongs to a Basic edition instance, the namespace ID is always default. If your pipeline belongs to an Enterprise edition instance, you can create a namespace.

api_versionThe version of the api that will be requested for example ‘v3’.

gcp_conn_idThe connection ID to use when fetching connection info.

delegate_toThe account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.

impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).

asynchronousFlag to return after submitting the pipeline ID to the Data Fusion API. This is useful for submitting long-running pipelines and waiting on them asynchronously using the CloudDataFusionPipelineStateSensor

deferrableRun operator in the deferrable mode. Is not related to asynchronous parameter. While asynchronous parameter gives a possibility to wait until pipeline reaches terminate state using sleep() method, deferrable mode checks for the state using asynchronous calls. It is not possible to use both asynchronous and deferrable parameters at the same time.

poll_intervalPolling period in seconds to check for the status. Used only in deferrable mode.

Documentation

Starts a Cloud Data Fusion pipeline. Works for both batch and stream pipelines.