DataprocCreateBatchOperator

Google

Creates a batch workload.

View on GitHub

Last Updated: Feb. 25, 2023

Access Instructions

Install the Google provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

project_idOptional. The ID of the Google Cloud project that the cluster belongs to. (templated)
regionRequired. The Cloud Dataproc region in which to handle the request. (templated)
batchRequiredRequired. The batch to create. (templated)
batch_idRequiredOptional. The ID to use for the batch, which will become the final component of the batch’s resource name. This value must be 4-63 characters. Valid characters are /[a-z][0-9]-/. (templated)
request_idOptional. A unique id used to identify the request. If the server receives two CreateBatchRequest requests with the same id, then the second request will be ignored and the first google.longrunning.Operation created and stored in the backend is returned.
retryA retry object used to retry requests. If None is specified, requests will not be retried.
result_retryResult retry object used to retry requests. Is used to decrease delay between executing chained tasks in a DAG by specifying exact amount of seconds for executing.
timeoutThe amount of time, in seconds, to wait for the request to complete. Note that if retry is specified, the timeout applies to each individual attempt.
metadataAdditional metadata that is provided to the method.
gcp_conn_idThe connection ID to use connecting to Google Cloud.
impersonation_chainOptional service account to impersonate using short-term credentials, or chained list of accounts required to get the access_token of the last account in the list, which will be impersonated in the request. If set as a string, the account must grant the originating account the Service Account Token Creator IAM role. If set as a sequence, the identities from the list must grant Service Account Token Creator IAM role to the directly preceding identity, with first account from the list granting this role to the originating account (templated).
asynchronousFlag to return after creating batch to the Dataproc API. This is useful for creating long-running batch and waiting on them asynchronously using the DataprocBatchSensor
deferrableRun operator in the deferrable mode.
polling_interval_secondsTime (seconds) to wait between calls to check the run status.

Documentation

Creates a batch workload.

Was this page helpful?