Observability & Data Quality
SqlCAD includes built-in features to monitor the health of your data pipeline and ensure data quality.
Overview
The observability framework consists of three parts:
- Metadata Schema (
sqlcad_meta): A system schema that stores statistics and test results. - Automatic Stats Collection: Every load operation captures basic statistics.
- Data Quality Assertions: You can define tests directly in your model definitions.
1. System Schema (sqlcad_meta)
SqlCAD automatically creates and manages a sqlcad_meta schema.
sqlcad_meta.table_stats: Stores full-table health snapshots (row counts, size).sqlcad_meta.load_stats: Tracks the impact of every load operation (rows inserted/updated).sqlcad_meta.test_results: Stores the history of all data quality test executions (PASS/FAIL status).
2. Automatic Stats Collection
When you build a project with materialization: table, SqlCAD generates load scripts (e.g., incremental merges or SCD2 updates). These scripts automatically inject a step to record the number of rows processed into sqlcad_meta.load_stats.
No configuration is required for this behavior.
3. Data Quality Assertions
You can define data quality tests using the tests metadata block on any ENTITY, DIM, FACT, MART, REPORT, or AGGREGATION.
Supported Tests
not_null: Checks that specific columns contain no NULL values.distinct: Checks that a column (or combination of columns) is unique across the table.expression: A custom SQL condition that identifies failing rows.
Example
DEFINE REPORT SalesLineItems { materialization: table, unique_key: [OrderID, ProductID], tests: - not_null: [OrderID, ProductID, LineTotal] - distinct: [OrderID, ProductID] - expression: "LineTotal >= 0" # Fails if LineTotal is negative } AS ( ... )
Verification Scripts
When you run sqlcad build, a verify folder is generated in your output directory (e.g., output/verify/). This contains SQL scripts (verify_{model_name}.sql) that run your assertions.
Orchestration
These scripts are designed to be run by your orchestrator immediately after the load step.
# Example orchestration flow duckdb warehouse.db < output/recurring/gold/saleslineitems.sql # Load Data duckdb warehouse.db < output/verify/verify_saleslineitems.sql # Verify Data
If any test fails (finds > 0 failing rows), a failure record is written to sqlcad_meta.test_results with status FAIL.