ray_task

Ray

Wraps a function to be executed on the Ray cluster.

Access Instructions

Install the Ray provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

python_callableOptional[Callable]Function to be invoked on the Ray cluster.
http_conn_idstrHttp connection id for conenction to ray.
ray_worker_poolOptional[str]The pool that controls the amount of parallel clients created to access the Ray cluster.
eagerOptional[bool]Whether to run the the function on the coordinator process (on the Ray cluster) or to send the function to a remote task. You should set this to False normally.

Documentation

The return values of the function will be cached on the Ray object store. Downstream tasks must be ray tasks too, as the dependencies will be fetched from the object store. The RayBackend will need to be setup in your Dockerfile to use this decorator.

Use as a task decorator:

from ray_provider.decorators import ray_task
def ray_example_dag():
@ray_task("ray_conn_id")
def sum_cols(df: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame(df.sum()).T

Was this page helpful?