DSL Reference

SqlCAD uses an "Extended SQL" format. It wraps standard SQL in DEFINE <BLOCK> <NAME> structures.

Metadata Styles

Metadata can be defined in two styles:

1. Braceless Style (Preferred)

Clean and readable. Uses standard YAML block style. No trailing commas required.

DEFINE DIM Users
    primary_key: user_id
AS ( ... )

2. Brace Style

Useful for single-line definitions or existing models. Uses YAML flow style inside braces.

DEFINE DIM Users { primary_key: user_id } AS ( ... )

Common Metadata

All blocks support:

description (string): Documentation for the block.
layer (string): Overrides default output folder (e.g., base, silver, gold).
source_file (string): Path to an external .sql file to use instead of the AS (...) body.
tests (list): Data quality assertions (e.g., not_null, distinct, expression). See Observability.

DEFINE TABLE

Defines a raw source table and optionally maps it to physical files for automated loading.

Metadata:

source (dict, optional): Configuration for physical data sources (e.g., CSV/Parquet files).
- path (string): Path or glob pattern to source files (e.g., data/exports/*.csv).
- format (string): File format (csv, parquet, json). Defaults to file extension.
- selection (string): Strategy for selecting files from a glob pattern.
  - all: (Default) Read all matching files.
  - latest: Select only the single file with the most recent modification time.
- table (string): Name of an existing table in the database (alternative to path).
columns (dict): Map of ColumnName to Type OR a configuration object.
- Simple Style: ColumnName: TypeString (e.g., UserID: INTEGER)
- Mapped Style:
  - type (string): Target SQL type.
  - source (string, optional): Raw column name in the source file.
  - expression (string, optional): SQL expression for transformation (e.g., CAST(price AS DECIMAL) / 100).
external (bool): If true, skips column validation. Useful for migration.

Example:

DEFINE TABLE RawCategories {
    source: {
        path: "simulation/exports/categories_*.csv",
        format: "csv",
        selection: "latest"
    },
    columns: {
        CategoryID: { type: INTEGER, source: category_id },
        CategoryName: { type: TEXT, source: category_name },
        Description: TEXT  // Maps to "Description" if source not provided
    }
}

DEFINE DIM

Defines a Dimension in the Silver Layer. Dimensions are stateful objects that track history (SCD Type 2).

Metadata:

primary_key (string): Required. The unique natural identifier for the object.
materialization (string): view (default) or table.
temporal (dict, optional): Customizes SCD2 behavior.
- valid_from_col (string): Defaults to valid_from.
- valid_to_col (string): Defaults to valid_to.
cluster_by (list): Columns to cluster by (e.g., [CustomerID]).
auto_optimize (bool): If true, enables default clustering (PK).

Implicit Columns:

valid_from (TIMESTAMP)
valid_to (TIMESTAMP)
is_current (BOOLEAN)
updated_at (TIMESTAMP)

Example:

DEFINE DIM Customer {
    primary_key: CustomerID
} AS ( SELECT * FROM raw.customers )

DEFINE FACT

Defines a Fact in the Silver Layer. Facts are immutable event records.

Metadata:

temporal (dict): Required.
- event_time_col (string): The timestamp column representing when the event occurred.
materialization (string): view (default) or table.
partition_by (string/expr): Partition expression (e.g., DATE(OrderDate)).
cluster_by (list): Columns to cluster by.
auto_optimize (bool): If true, enables default partitioning (DATE of event_time).

Example:

DEFINE FACT Sales {
    temporal: { event_time_col: OrderDate }
} AS ( SELECT * FROM raw.sales )

DEFINE ENTITY

Legacy / Abstract. Defines a generic Silver Layer object. Ideally, use DIM or FACT for specific semantics. ENTITY can be used for standard tables that don't fit strict Dim/Fact patterns or for Type 1 dimensions.

Metadata:

primary_key (string): Required.
type (string): dimension or fact (deprecated hint).
temporal (dict): Enables SCD2 logic (same as DIM).

DEFINE REPORT

Defines a final, user-facing output (e.g., for BI tools).

Rules:

Must specify explicit columns. No SELECT *.
Automatically includes a reference_time column (defaults to CURRENT_TIMESTAMP if omitted).

Metadata:

temporal (dict): Point-in-time configuration (optional).
columns (dict): Required. Explicit schema definition.
materialization (string): view (default) or table.

Example:

DEFINE REPORT MonthlyRevenue {
    columns: { Month: DATE, Revenue: NUMERIC }
} AS ( ... )

DEFINE AGGREGATION

Defines a reusable intermediate metric layer (e.g., "Customer Stats").

Rules:

SQL Body must contain a GROUP BY clause.
Automatically includes a reference_time column.

Metadata:

primary_key (string): Required.
materialization (string): view (default) or table.

Example:

DEFINE AGGREGATION CustomerStats {
    primary_key: CustomerID
} AS ( SELECT CustomerID, SUM(Amount) ... GROUP BY CustomerID )

DEFINE FEATURE_SET

Defines a collection of ML features, automatically handling Point-in-Time (PIT) correctness.

Styles:

Compositional (Preferred): References a list of existing DEFINE FEATURE blocks.
Monolithic: Defines SQL logic directly (discouraged).

Metadata:

entity (string): Required. The anchor entity (e.g., Customers).
features (list): List of feature names to include.
temporal (dict):
- point_in_time (bool): If true, ensures features are joined respecting validity windows.

Example (Compositional):

DEFINE FEATURE_SET CustomerFeatures {
    entity: Customers
    temporal: { point_in_time: true }
    features:
        - customer_lifetime_value
        - customer_last_click
}

DEFINE MART

Generic / Legacy. Defines a flexible gold layer table. prefer REPORT or AGGREGATION.

Metadata:

temporal (dict):
- event_time_col (string): Used to calculate Point-in-Time joins against upstream Entities.
STRATEGY (string):
- INCREMENTAL: Generates an append-only load script.
- MERGE: Generates an upsert load script (Update existing, Insert new).
INCREMENT_BY (string): The column to use for high-water mark (required if STRATEGY is set).
unique_key (list/string): Required for STRATEGY: MERGE. Keys to match records for updates.
materialization (string): view (default) or table.

Example:

DEFINE MART UserGrowth
    temporal:
        event_time_col: signup_date
    STRATEGY: INCREMENTAL
    INCREMENT_BY: signup_date
AS ( ... )

DEFINE JOIN

Defines a relationship edge in the graph.

Metadata:

from (string): Name of the starting Entity.
to (string): Name of the target Entity.
type (string): one_to_one, one_to_many, many_to_one, many_to_many.

Example:

DEFINE JOIN user_orders ON
    from: Orders
    to: Users
    type: many_to_one
AS ( 
    LEFT JOIN Users u ON u.id = Orders.user_id
)

DEFINE FEATURE

Defines a reusable ML feature logic.

Metadata:

entity (string): The entity this feature attaches to.
join_key (string): The column to join on.
timestamp (string): The column representing the event time.
aggregation (string): SQL aggregation (e.g., COUNT(*)).
window_unit (string): days or rows.
window (int): Number of units to look back.
order_by (string): Column to sort by (required if unit is rows).

Example:

DEFINE FEATURE last_3_orders
    entity: Users
    join_key: user_id
    timestamp: order_date,
    aggregation: SUM(amount)
    window_unit: rows
AS ( SELECT * FROM Orders )

DEFINE FEATURE_TABLE

Defines a dataset generation task (Training or Scoring) by combining a Feature Set with a population and time spine.

Metadata:

feature_set (string): Required. Name of the FEATURE_SET to use.
spine (dict): Required.
- entity_source (string): SQL query returning the entities (e.g., SELECT CustomerID FROM Customers).
- time (dict):
  - start_date (string/date): Start of the date range.
  - end_date (string/date): End of the date range. Defaults to CURRENT_DATE.
materialization (string): table (recommended) or view.
primary_key (list): Required for tables. Composite key [spine_date, EntityID].

Example:

DEFINE FEATURE_TABLE CustomerTrainingData {
    materialization: table,
    feature_set: CustomerFeatures,
    columns:
        spine_date: TIMESTAMP
        CustomerID: TEXT
        customer_lifetime_orders: INTEGER
    primary_key:
        - spine_date
        - CustomerID
    spine:
        entity_source: "SELECT CustomerID FROM Customers"
        time:
            start_date: "2023-01-01"
            end_date: "2023-12-31"
}

Introduction

Concepts

Guides

Reference

DSL Reference

Metadata Styles

1. Braceless Style (Preferred)

2. Brace Style

Common Metadata

DEFINE TABLE

DEFINE DIM

DEFINE FACT

DEFINE ENTITY

DEFINE REPORT

DEFINE AGGREGATION

DEFINE FEATURE_SET

DEFINE MART

DEFINE JOIN

DEFINE FEATURE

DEFINE FEATURE_TABLE