WhylogsConstraintsOperator
whylogsCreates a whylogs’ Constraints report from a Constraints object or by using our pre-defined constraint factories, as the example below shows. Currently our API requires the user to have a profiled DataFrame in place to be able to use it, so you will have to point to a location where a profiled dataset exists. Then the operator will run a constraint suite that will check which conditions have passed or failed. Users will also be able to leverage this to stop executions in case some criteria is not met.
Access Instructions
Install the whylogs provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
Creates a whylogs’ Constraints report from a Constraints object or by using our pre-defined constraint factories, as the example below shows. Currently our API requires the user to have a profiled DataFrame in place to be able to use it, so you will have to point to a location where a profiled dataset exists. Then the operator will run a constraint suite that will check which conditions have passed or failed. Users will also be able to leverage this to stop executions in case some criteria is not met.
from whylogs.core.constraints.factories import greater_than_numberTASK_ID = "column_1_check"PROFILE_PATH = "s3://some/prefix/to/a/profile.bin"with DAG(dag_id='constraints_example', start_date=datetime.now()) as dag:op = WhylogsConstraintsOperator(task_id=TASK_ID,profile_path=PROFILE_PATH,reader="s3",constraint=greater_than_number(number=0.0, column="column_1"),)op
This allows for a higher granularity in terms of quickly identifying which tasks have failed, and also can make the DAG more lenient towards breaking with some core checks and raising a warning with others. If instead you wish to run all checks in a single task, the best thing is to inject a :class:Constraints object. The following example demonstrates how to do it:
from whylogs.core.constraints.factories import (smaller_than_number,mean_between_range,null_percentage_below_number,)TASK_ID = "column_1_check"PROFILE_PATH = "s3://some/prefix/to/a/profile.bin"def build_constraints():profile_view = why.reader("s3").read(path=PROFILE_PATH)builder = ConstraintsBuilder(dataset_profile_view=profile_view)builder.add_constraint(smaller_than_number(column_name="bp", number=20.0))builder.add_constraint(mean_between_range(column_name="s3", lower=-1.5, upper=1.5))builder.add_constraint(null_percentage_below_number(column_name="sex", number=0.0))constraints = builder.build()return constraintswith DAG(dag_id='constraints_example', start_date=datetime.now()) as dag:op = WhylogsConstraintsOperator(task_id=TASK_ID, profile_path=PROFILE_PATH, reader="s3", constraints=build_constraints())op
If you want to learn more about running constraint checks with whylogs, please check out our [docs and examples](https://whylogs.readthedocs.io/)