BeamBasePipelineOperator

Apache Beam

Abstract base class for Beam Pipeline Operators.

View on GitHub

Last Updated: Jan. 23, 2023

Access Instructions

Install the Apache Beam provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

runnerRunner on which pipeline will be run. By default “DirectRunner” is being used. Other possible options: DataflowRunner, SparkRunner, FlinkRunner, PortableRunner. See: BeamRunnerType See: https://beam.apache.org/documentation/runners/capability-matrix/
default_pipeline_optionsMap of default pipeline options.
pipeline_optionsMap of pipeline options.The key must be a dictionary. The value can contain different types: If the value is None, the single option - --key (without value) will be added. If the value is False, this option will be skipped If the value is True, the single option - --key (without value) will be added. If the value is list, the many options will be added for each key. If the value is ['A', 'B'] and the key is key then the --key=A --key=B options will be left Other value types will be replaced with the Python textual representation. When defining labels (labels option), you can also provide a dictionary.
gcp_conn_idOptional. The connection ID to use connecting to Google Cloud Storage if python file is on GCS.
delegate_toOptional. The account to impersonate using domain-wide delegation of authority, if any. For this to work, the service account making the request must have domain-wide delegation enabled.
dataflow_configDataflow configuration, used when runner type is set to DataflowRunner, (optional) defaults to None.

Documentation

Abstract base class for Beam Pipeline Operators.

Was this page helpful?