ExternalPythonOperator

Apache Airflow

Allows one to run a function in a virtualenv that is not re-created but used as is without the overhead of creating the virtualenv (with certain caveats).

View on GitHub

Last Updated: Apr. 8, 2023

Access Instructions

Install the Apache Airflow provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

pythonRequiredFull path string (file-system specific) that points to a Python binary inside a virtualenv that should be used (in VENV/bin folder). Should be absolute path (so usually start with “/” or “X:/” depending on the filesystem/os used).
python_callableRequiredA python function with no references to outside variables, defined with def, which will be run in a virtualenv
use_dillWhether to use dill to serialize the args and result (pickle is default). This allow more complex types but if dill is not preinstalled in your venv, the task will fail with use_dill enabled.
op_argsA list of positional arguments to pass to python_callable.
op_kwargsA dict of keyword arguments to pass to python_callable.
string_argsStrings that are present in the global var virtualenv_string_args, available to python_callable at runtime as a list[str]. Note that args are split by newline.
templates_dicta dictionary where the values are templates that will get templated by the Airflow engine sometime between __init__ and execute takes place and are made available in your callable’s context after the template has been applied
templates_extsa list of file extensions to resolve while processing templated fields, for examples ['.sql', '.hql']
expect_airflowexpect Airflow to be installed in the target environment. If true, the operator will raise warning if Airflow is not installed, and it will attempt to load Airflow macros when starting.

Documentation

Allows one to run a function in a virtualenv that is not re-created but used as is without the overhead of creating the virtualenv (with certain caveats).

The function must be defined using def, and not be part of a class. All imports must happen inside the function and no variables outside the scope may be referenced. A global scope variable named virtualenv_string_args will be available (populated by string_args). In addition, one can pass stuff through op_args and op_kwargs, and one can use a return value. Note that if your virtualenv runs in a different Python major version than Airflow, you cannot use return values, op_args, op_kwargs, or use any macros that are being provided to Airflow through plugins. You can use string_args though.

If Airflow is installed in the external environment in different version that the version used by the operator, the operator will fail.,

See also

For more information on how to use this operator, take a look at the guide: ExternalPythonOperator

Was this page helpful?