WhylogsConstraintsOperator

whylogs

Creates a whylogs’ Constraints report from a Constraints object or by using our pre-defined constraint factories, as the example below shows. Currently our API requires the user to have a profiled DataFrame in place to be able to use it, so you will have to point to a location where a profiled dataset exists. Then the operator will run a constraint suite that will check which conditions have passed or failed. Users will also be able to leverage this to stop executions in case some criteria is not met.

View on GitHub

Last Updated: Aug. 22, 2022

Access Instructions

Install the whylogs provider package into your Airflow environment.

Import the module into your DAG file and instantiate it with your desired params.

Parameters

profile_pathOptional, strThe dataset profile path location, in case you want to use a single built-in constraint. Defaults to None.
readerOptional, strThe desired whylogs profile reader to choose from. Learn about the existing readers on [our docs](https://whylogs.readthedocs.io/en/latest/index.html). Defaults to None.
constraint:class:MetricConstraint, OptionalA MetricConstraints object, that can be used by leveraging the existing constraint factories on whylogs, as the example below shows. Defaults to None.
constraints:class:Constraint, OptionalA Constraints object, that will have a user-defined constraints suite, as the second example below shows. Defaults to None.
break_pipelineboolDecide if you wish to raise an Airflow Exception and stop the existing DAG execution. Defaults to False

Documentation

Creates a whylogs’ Constraints report from a Constraints object or by using our pre-defined constraint factories, as the example below shows. Currently our API requires the user to have a profiled DataFrame in place to be able to use it, so you will have to point to a location where a profiled dataset exists. Then the operator will run a constraint suite that will check which conditions have passed or failed. Users will also be able to leverage this to stop executions in case some criteria is not met.

from whylogs.core.constraints.factories import greater_than_number
TASK_ID = "column_1_check"
PROFILE_PATH = "s3://some/prefix/to/a/profile.bin"
with DAG(dag_id='constraints_example', start_date=datetime.now()) as dag:
op = WhylogsConstraintsOperator(
task_id=TASK_ID,
profile_path=PROFILE_PATH,
reader="s3",
constraint=greater_than_number(number=0.0, column="column_1"),
)
op

This allows for a higher granularity in terms of quickly identifying which tasks have failed, and also can make the DAG more lenient towards breaking with some core checks and raising a warning with others. If instead you wish to run all checks in a single task, the best thing is to inject a :class:Constraints object. The following example demonstrates how to do it:

from whylogs.core.constraints.factories import (
smaller_than_number,
mean_between_range,
null_percentage_below_number,
)
TASK_ID = "column_1_check"
PROFILE_PATH = "s3://some/prefix/to/a/profile.bin"
def build_constraints():
profile_view = why.reader("s3").read(path=PROFILE_PATH)
builder = ConstraintsBuilder(dataset_profile_view=profile_view)
builder.add_constraint(smaller_than_number(column_name="bp", number=20.0))
builder.add_constraint(mean_between_range(column_name="s3", lower=-1.5, upper=1.5))
builder.add_constraint(null_percentage_below_number(column_name="sex", number=0.0))
constraints = builder.build()
return constraints
with DAG(dag_id='constraints_example', start_date=datetime.now()) as dag:
op = WhylogsConstraintsOperator(
task_id=TASK_ID, profile_path=PROFILE_PATH, reader="s3", constraints=build_constraints()
)
op

If you want to learn more about running constraint checks with whylogs, please check out our [docs and examples](https://whylogs.readthedocs.io/)

Was this page helpful?