BaseBranchOperator

Apache Airflow Certified

This is a base class for creating operators with branching functionality, similarly to BranchPythonOperator.

View on GitHub

Last Updated: Nov. 27, 2022

Access Instructions

Install the Apache Airflow provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

Documentation

Parameters

task_ida unique, meaningful id for the task

ownerthe owner of the task. Using a meaningful description (e.g. user/person/team/role name) to clarify ownership is recommended.

emailthe ‘to’ email address(es) used in email alerts. This can be a single email or multiple ones. Multiple addresses can be specified as a comma or semicolon separated string or by passing a list of strings.

email_on_retryIndicates whether email alerts should be sent when a task is retried

email_on_failureIndicates whether email alerts should be sent when a task failed

retriesthe number of retries that should be performed before failing the task

retry_delaydelay between retries, can be set as timedelta or float seconds, which will be converted into timedelta, the default is timedelta(seconds=300).

retry_exponential_backoffallow progressively longer waits between retries by using exponential backoff algorithm on retry delay (delay will be converted into seconds)

max_retry_delaymaximum delay interval between retries, can be set as timedelta or float seconds, which will be converted into timedelta.

start_dateThe start_date for the task, determines the execution_date for the first task instance. The best practice is to have the start_date rounded to your DAG’s schedule_interval. Daily jobs have their start_date some day at 00:00:00, hourly jobs have their start_date at 00:00 of a specific hour. Note that Airflow simply looks at the latest execution_date and adds the schedule_interval to determine the next execution_date. It is also very important to note that different tasks’ dependencies need to line up in time. If task A depends on task B and their start_date are offset in a way that their execution_date don’t line up, A’s dependencies will never be met. If you are looking to delay a task, for example running a daily task at 2AM, look into the TimeSensor and TimeDeltaSensor. We advise against using dynamic start_date and recommend using fixed ones. Read the FAQ entry about start_date for more information.

end_dateif specified, the scheduler won’t go beyond this date

depends_on_pastwhen set to true, task instances will run sequentially and only if the previous instance has succeeded or has been skipped. The task instance for the start_date is allowed to run.

wait_for_downstreamwhen set to true, an instance of task X will wait for tasks immediately downstream of the previous instance of task X to finish successfully or be skipped before it runs. This is useful if the different instances of a task X alter the same asset, and this asset is used by tasks downstream of task X. Note that depends_on_past is forced to True wherever wait_for_downstream is used. Also note that only tasks immediately downstream of the previous task instance are waited for; the statuses of any tasks further downstream are ignored.

daga reference to the dag the task is attached to (if any)

priority_weightpriority weight of this task against other task. This allows the executor to trigger higher priority tasks before others when things get backed up. Set priority_weight as a higher number for more important tasks.

weight_ruleweighting method used for the effective total priority weight of the task. Options are: { downstream | upstream | absolute } default is downstream When set to downstream the effective weight of the task is the aggregate sum of all downstream descendants. As a result, upstream tasks will have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and desire to have all upstream tasks to complete for all runs before each dag can continue processing downstream tasks. When set to upstream the effective weight is the aggregate sum of all upstream ancestors. This is the opposite where downstream tasks have higher weight and will be scheduled more aggressively when using positive weight values. This is useful when you have multiple dag run instances and prefer to have each dag complete before starting upstream tasks of other dags. When set to absolute, the effective weight is the exact priority_weight specified without additional weighting. You may want to do this when you know exactly what priority weight each task should have. Additionally, when set to absolute, there is bonus effect of significantly speeding up the task creation process as for very large DAGs. Options can be set as string or using the constants defined in the static class airflow.utils.WeightRule

queuewhich queue to target when running this job. Not all executors implement queue management, the CeleryExecutor does support targeting specific queues.

poolthe slot pool this task should run in, slot pools are a way to limit concurrency for certain tasks

pool_slotsthe number of pool slots this task should use (>= 1) Values less than 1 are not allowed.

slatime by which the job is expected to succeed. Note that this represents the timedelta after the period is closed. For example if you set an SLA of 1 hour, the scheduler would send an email soon after 1:00AM on the 2016-01-02 if the 2016-01-01 instance has not succeeded yet. The scheduler pays special attention for jobs with an SLA and sends alert emails for SLA misses. SLA misses are also recorded in the database for future reference. All tasks that share the same SLA time get bundled in a single email, sent soon after that time. SLA notification are sent once and only once for each task instance.

execution_timeoutmax time allowed for the execution of this task instance, if it goes beyond it will raise and fail.

on_failure_callbacka function to be called when a task instance of this task fails. a context dictionary is passed as a single parameter to this function. Context contains references to related objects to the task instance and is documented under the macros section of the API.

on_execute_callbackmuch like the on_failure_callback except that it is executed right before the task is executed.

on_retry_callbackmuch like the on_failure_callback except that it is executed when retries occur.

on_success_callbackmuch like the on_failure_callback except that it is executed when the task succeeds.

pre_executea function to be called immediately before task execution, receiving a context dictionary; raising an exception will prevent the task from being executed. |experimental|

post_executea function to be called immediately after task execution, receiving a context dictionary and task result; raising an exception will prevent the task from succeeding. |experimental|

trigger_ruledefines the rule by which dependencies are applied for the task to get triggered. Options are: { all_success | all_failed | all_done | all_skipped | one_success | one_done | one_failed | none_failed | none_failed_min_one_success | none_skipped | always} default is all_success. Options can be set as string or using the constants defined in the static class airflow.utils.TriggerRule

resourcesA map of resource parameter names (the argument names of the Resources constructor) to their values.

run_as_userunix username to impersonate while running the task

max_active_tis_per_dagWhen set, a task will be able to limit the concurrent runs across execution_dates.

executor_configAdditional task-level configuration parameters that are interpreted by a specific executor. Parameters are namespaced by the name of executor. Example: to run this task in a specific docker container through the KubernetesExecutor MyOperator(..., executor_config={ "KubernetesExecutor": {"image": "myCustomDockerImage"} } )

do_xcom_pushif True, an XCom is pushed containing the Operator’s result

task_groupThe TaskGroup to which the task should belong. This is typically provided when not using a TaskGroup as a context manager.

docAdd documentation or notes to your Task objects that is visible in Task Instance details View in the Webserver

doc_mdAdd documentation (in Markdown format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

doc_rstAdd documentation (in RST format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

doc_jsonAdd documentation (in JSON format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

doc_yamlAdd documentation (in YAML format) or notes to your Task objects that is visible in Task Instance details View in the Webserver

Documentation

This is a base class for creating operators with branching functionality, similarly to BranchPythonOperator.

Users should subclass this operator and implement the function choose_branch(self, context). This should run whatever business logic is needed to determine the branch, and return either the task_id for a single task (as a str) or a list of task_ids.

The operator will continue with the returned task_id(s), and all other tasks directly downstream of this operator will be skipped.