ray_task
RayWraps a function to be executed on the Ray cluster.
Access Instructions
Install the Ray provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
The return values of the function will be cached on the Ray object store. Downstream tasks must be ray tasks too, as the dependencies will be fetched from the object store. The RayBackend will need to be setup in your Dockerfile to use this decorator.
Use as a task decorator:
from ray_provider.decorators import ray_taskdef ray_example_dag():@ray_task("ray_conn_id")def sum_cols(df: pd.DataFrame) -> pd.DataFrame:return pd.DataFrame(df.sum()).T
Example DAGs
An example DAG manipulating a Pandas dataframe using the Ray backend.
A DAG using a basic Jupyter notebook model that pulls the HIGGS dataset, splits training and testing data, and creates/validates a model using the XGBoost on Ray.
Using the Ray backend and XGBoost to tune a simpled Jupyter notebook model that pulls the HIGGS dataset.
build dataframe from breast cancer dataset
#### build random dataframe task
Build a dataframe task.