SQLColumnCheckOperator
Common SQLPerforms one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping.
Access Instructions
Install the Common SQL provider package into your Airflow environment.
Import the module into your DAG file and instantiate it with your desired params.
Parameters
Documentation
Performs one or more of the templated checks in the column_checks dictionary. Checks are performed on a per-column basis specified by the column_mapping.
Each check can take one or more of the following options: - equal_to: an exact value to equal, cannot be used with other comparison options - greater_than: value that result should be strictly greater than - less_than: value that results should be strictly less than - geq_to: value that results should be greater than or equal to - leq_to: value that results should be less than or equal to - tolerance: the percentage that the result may be off from the expected value - partition_clause: an extra clause passed into a WHERE statement to partition data
{"col_name": {"null_check": {"equal_to": 0,"partition_clause": "foreign_key IS NOT NULL",},"min": {"greater_than": 5,"leq_to": 10,"tolerance": 0.2,},"max": {"less_than": 1000, "geq_to": 10, "tolerance": 0.01},}}
- param partition_clause
a partial SQL statement that is added to a WHERE clause in the query built by the operator that creates partition_clauses for the checks to run on, e.g.
"date = '1970-01-01'"
- param conn_id
the connection ID used to connect to the database
- param database
name of database which overwrite the defined one in connection
- param accept_none
whether or not to accept None values returned by the query. If true, converts None to 0.
See also
For more information on how to use this operator, take a look at the guide: Check SQL Table Columns
Example DAGs
Example DAG showcasing loading, transforming, and data quality checking with multiple datasets in Snowflake.
Imports local files to S3, then to CrateDB and checks several data quality properties