DSL Reference
SqlCAD uses an "Extended SQL" format. It wraps standard SQL in DEFINE <BLOCK> <NAME> structures.
Metadata Styles
Metadata can be defined in two styles:
1. Braceless Style (Preferred)
Clean and readable. Uses standard YAML block style. No trailing commas required.
DEFINE DIM Users primary_key: user_id AS ( ... )
2. Brace Style
Useful for single-line definitions or existing models. Uses YAML flow style inside braces.
DEFINE DIM Users { primary_key: user_id } AS ( ... )
Common Metadata
All blocks support:
description(string): Documentation for the block.layer(string): Overrides default output folder (e.g.,base,silver,gold).source_file(string): Path to an external .sql file to use instead of theAS (...)body.tests(list): Data quality assertions (e.g.,not_null,distinct,expression). See Observability.
DEFINE TABLE
Defines a raw source table and optionally maps it to physical files for automated loading.
Metadata:
source(dict, optional): Configuration for physical data sources (e.g., CSV/Parquet files).path(string): Path or glob pattern to source files (e.g.,data/exports/*.csv).format(string): File format (csv,parquet,json). Defaults to file extension.selection(string): Strategy for selecting files from a glob pattern.all: (Default) Read all matching files.latest: Select only the single file with the most recent modification time.
table(string): Name of an existing table in the database (alternative topath).
columns(dict): Map ofColumnNametoTypeOR a configuration object.- Simple Style:
ColumnName: TypeString(e.g.,UserID: INTEGER) - Mapped Style:
type(string): Target SQL type.source(string, optional): Raw column name in the source file.expression(string, optional): SQL expression for transformation (e.g.,CAST(price AS DECIMAL) / 100).
- Simple Style:
external(bool): Iftrue, skips column validation. Useful for migration.
Example:
DEFINE TABLE RawCategories { source: { path: "simulation/exports/categories_*.csv", format: "csv", selection: "latest" }, columns: { CategoryID: { type: INTEGER, source: category_id }, CategoryName: { type: TEXT, source: category_name }, Description: TEXT // Maps to "Description" if source not provided } }
DEFINE DIM
Defines a Dimension in the Silver Layer. Dimensions are stateful objects that track history (SCD Type 2).
Metadata:
primary_key(string): Required. The unique natural identifier for the object.materialization(string):view(default) ortable.temporal(dict, optional): Customizes SCD2 behavior.valid_from_col(string): Defaults tovalid_from.valid_to_col(string): Defaults tovalid_to.
cluster_by(list): Columns to cluster by (e.g.,[CustomerID]).auto_optimize(bool): Iftrue, enables default clustering (PK).
Implicit Columns:
valid_from(TIMESTAMP)valid_to(TIMESTAMP)is_current(BOOLEAN)updated_at(TIMESTAMP)
Example:
DEFINE DIM Customer { primary_key: CustomerID } AS ( SELECT * FROM raw.customers )
DEFINE FACT
Defines a Fact in the Silver Layer. Facts are immutable event records.
Metadata:
temporal(dict): Required.event_time_col(string): The timestamp column representing when the event occurred.
materialization(string):view(default) ortable.partition_by(string/expr): Partition expression (e.g.,DATE(OrderDate)).cluster_by(list): Columns to cluster by.auto_optimize(bool): Iftrue, enables default partitioning (DATE of event_time).
Example:
DEFINE FACT Sales { temporal: { event_time_col: OrderDate } } AS ( SELECT * FROM raw.sales )
DEFINE ENTITY
Legacy / Abstract. Defines a generic Silver Layer object.
Ideally, use DIM or FACT for specific semantics. ENTITY can be used for standard tables that don't fit strict Dim/Fact patterns or for Type 1 dimensions.
Metadata:
primary_key(string): Required.type(string):dimensionorfact(deprecated hint).temporal(dict): Enables SCD2 logic (same as DIM).
DEFINE REPORT
Defines a final, user-facing output (e.g., for BI tools).
Rules:
- Must specify explicit
columns. NoSELECT *. - Automatically includes a
reference_timecolumn (defaults toCURRENT_TIMESTAMPif omitted).
Metadata:
temporal(dict): Point-in-time configuration (optional).columns(dict): Required. Explicit schema definition.materialization(string):view(default) ortable.
Example:
DEFINE REPORT MonthlyRevenue { columns: { Month: DATE, Revenue: NUMERIC } } AS ( ... )
DEFINE AGGREGATION
Defines a reusable intermediate metric layer (e.g., "Customer Stats").
Rules:
- SQL Body must contain a
GROUP BYclause. - Automatically includes a
reference_timecolumn.
Metadata:
primary_key(string): Required.materialization(string):view(default) ortable.
Example:
DEFINE AGGREGATION CustomerStats { primary_key: CustomerID } AS ( SELECT CustomerID, SUM(Amount) ... GROUP BY CustomerID )
DEFINE FEATURE_SET
Defines a collection of ML features, automatically handling Point-in-Time (PIT) correctness.
Styles:
- Compositional (Preferred): References a list of existing
DEFINE FEATUREblocks. - Monolithic: Defines SQL logic directly (discouraged).
Metadata:
entity(string): Required. The anchor entity (e.g.,Customers).features(list): List of feature names to include.temporal(dict):point_in_time(bool): Iftrue, ensures features are joined respecting validity windows.
Example (Compositional):
DEFINE FEATURE_SET CustomerFeatures { entity: Customers temporal: { point_in_time: true } features: - customer_lifetime_value - customer_last_click }
DEFINE MART
Generic / Legacy. Defines a flexible gold layer table. prefer REPORT or AGGREGATION.
Metadata:
temporal(dict):event_time_col(string): Used to calculate Point-in-Time joins against upstream Entities.
STRATEGY(string):INCREMENTAL: Generates an append-only load script.MERGE: Generates an upsert load script (Update existing, Insert new).
INCREMENT_BY(string): The column to use for high-water mark (required ifSTRATEGYis set).unique_key(list/string): Required forSTRATEGY: MERGE. Keys to match records for updates.materialization(string):view(default) ortable.
Example:
DEFINE MART UserGrowth temporal: event_time_col: signup_date STRATEGY: INCREMENTAL INCREMENT_BY: signup_date AS ( ... )
DEFINE JOIN
Defines a relationship edge in the graph.
Metadata:
from(string): Name of the starting Entity.to(string): Name of the target Entity.type(string):one_to_one,one_to_many,many_to_one,many_to_many.
Example:
DEFINE JOIN user_orders ON from: Orders to: Users type: many_to_one AS ( LEFT JOIN Users u ON u.id = Orders.user_id )
DEFINE FEATURE
Defines a reusable ML feature logic.
Metadata:
entity(string): The entity this feature attaches to.join_key(string): The column to join on.timestamp(string): The column representing the event time.aggregation(string): SQL aggregation (e.g.,COUNT(*)).window_unit(string):daysorrows.window(int): Number of units to look back.order_by(string): Column to sort by (required if unit isrows).
Example:
DEFINE FEATURE last_3_orders entity: Users join_key: user_id timestamp: order_date, aggregation: SUM(amount) window_unit: rows AS ( SELECT * FROM Orders )
DEFINE FEATURE_TABLE
Defines a dataset generation task (Training or Scoring) by combining a Feature Set with a population and time spine.
Metadata:
feature_set(string): Required. Name of theFEATURE_SETto use.spine(dict): Required.entity_source(string): SQL query returning the entities (e.g.,SELECT CustomerID FROM Customers).time(dict):start_date(string/date): Start of the date range.end_date(string/date): End of the date range. Defaults toCURRENT_DATE.
materialization(string):table(recommended) orview.primary_key(list): Required for tables. Composite key[spine_date, EntityID].
Example:
DEFINE FEATURE_TABLE CustomerTrainingData { materialization: table, feature_set: CustomerFeatures, columns: spine_date: TIMESTAMP CustomerID: TEXT customer_lifetime_orders: INTEGER primary_key: - spine_date - CustomerID spine: entity_source: "SELECT CustomerID FROM Customers" time: start_date: "2023-01-01" end_date: "2023-12-31" }