DENIS IL.
Engineering Insights6 min read

Why AI Needs a Semantic Layer

LLMs can reason over data fluently. The problem is not capability — it is context. Without a semantic layer, AI analytics is a probabilistic exercise. With one, it becomes deterministic.

Semantic LayerAI AnalyticsdbtLLM
01

The Problem

Ask an AI to analyze your data without a semantic layer and it will try. It will read column names, infer relationships, guess at business rules, and produce an answer that sounds confident and is often wrong in ways that are impossible to detect from the output alone.

The issue is not that modern LLMs are incapable of reasoning over data. They are remarkably good at it. The issue is that they are reasoning over the wrong thing — raw table structures, physical column names, and implicit business logic that lives in analyst heads, not in code.

A column called 'amt' could be gross revenue, net revenue, refund amount, or tax. A table called 'sessions' could mean user sessions, gaming sessions, or therapy sessions. Without explicit context, the AI guesses. And guesses compound.

02

What the Semantic Layer Provides

  • Business concept definitions — 'Revenue' is defined once as a specific dbt metric with explicit filters, date logic, and known caveats. The AI queries the concept, not the column.
  • Metric governance — every metric has exactly one definition, tested on every pipeline run. The same question always produces the same answer from the same governed data asset.
  • Natural language grounding — business vocabulary maps to governed data assets. 'Last month's active customers' resolves to a specific mart table, specific columns, and specific filter logic — not a best guess.
  • Context injection — before answering any question, the AI receives the relevant semantic definitions: business rules, filter constraints, known edge cases. This makes AI reasoning explainable, not opaque.
03

The Architecture

The semantic layer sits between the data warehouse and the AI. In practice, this means dbt metric definitions — or a dedicated semantic layer tool like Cube.dev — that codify business logic as a programmatic API.

When the AI receives a query, it first resolves business terms against the semantic layer. 'Revenue' becomes a specific SQL expression with known semantics. 'Active customers' resolves to a mart table with an explicit activity definition. Only then does the AI generate the query and retrieve the answer.

The result is analytics where every AI-generated answer can be traced to a specific, governed metric definition. Not a hallucination. Not a guess. A deterministic lookup of business logic defined by humans and enforced by code.

04

The Takeaway

The semantic layer is not an AI optimization. It is a prerequisite.

Every hour spent on prompt engineering to compensate for missing business context is an hour that should have been spent building the semantic layer. The investment compounds — every metric definition added to the semantic layer is immediately available to every AI query, every dashboard, and every downstream system.