dbt Labs
AI-ready data in practice: What dbt Semantic Layer and dbt's MCP server and agent skills do for your team
↗When it comes to getting their data AI-ready, many organizations start with cleaning and structuring their data and then simply stop. This is an important first step, but it's not the last step, because AI-ready data relies heavily on context: the layer of meaning that explains what your data actually represents.
You need to gather as much information as you can about that data: Where are data points coming from? Which team defines the metric? Which team owns inputting this data into a system? Without answers to questions like these, even clean, well-structured data can lead AI astray.
One way to think about AI is as a great teammate that knows SQL and analytics really, really well but knows zero about your organization. An agent doesn't know the different acronyms used in your industry, for example, and it doesn't understand your business goals. For AI to work effectively and efficiently, you need to give it all that important context to make the data meaningful.
In practice, teams use dbt's AI capabilities to make data meaningful to AI agents. dbt lives on top of tools like Snowflake, BigQuery, and Databricks to transform data without having to use stored procedures or other data transformation techniques, and there are three key pieces to dbt's AI stack: the dbt Semantic Layer, dbt MCP server, and dbt agent skills. Here's what they are, how they work together, and how to use them to ensure high-quality, AI-ready data.
The semantic data layer is your lens
The semantic data layer provides all of the context that the AI will need to understand your data: the structure of the data, how you work with the data, and what exists in the data.
I think of it like this: I have very bad eyesight. When I take my glasses off, I can still see things, but they are far from in focus. There will be some things that I miss and other things that are incomplete in my vision because I can't fully see everything. When I put my glasses on, I'm able to see clearly and completely. This is essentially what a semantic layer does for your data.
A generic semantic layer is like buying plain, off-the-rack reading glasses. It makes things somewhat clearer; you will get answers some, but not all, of the time, and you're not getting the most detailed vision possible.
A governed, dbt-backed semantic layer gives you prescription lenses that are custom-focused for your business's vision, signed off by someone trusted, and updated through scheduled exams as your vision (your data, your definitions, your business) change. AI wearing drugstore readers might see something somewhat clearly, but it'll squint and need to occasionally guess. AI wearing your prescription sees exactly what your business means by "revenue," "active customer," or "churn" and keeps seeing correctly as those definitions evolve.
So when we talk about gathering context around data, most of that context is typically handled within the semantic layer. This is especially true when it comes to what certain columns mean, what certain metrics are, and how different values or properties are to be calculated.
You don't need a perfect semantic layer to start
You can get a lot of use out of dbt's AI tooling even without a semantic data layer in place. The semantic layer is mainly used for conversational AI, letting agents query your actual data and return reliable AI outputs. But if you want to use dbt's AI tooling for development workflows, you don't need it. There are still things that you can do with dbt's AI tools outside of it, like diagnosing job failures, finding column-level lineage, and other things that really speed up your workflow.
Don't let not having your data fully cleaned up, or not yet having your data fully defined in the semantic layer, be what stops you from using dbt's AI tools. You can absolutely start using them now, and you can even use some of them to help build your semantic layer as you go.
Three pieces of the dbt AI stack
Terms like "agent skills" and "MCP server" can be intimidating when you first hear them. Let's demystify these.
MCP server: the tools. An MCP server is a set of tools like API calls that can be used to communicate with applications on the backend. Its function is to give the agent instructions on how to make those calls and how to use what it gets back. For example, there's a tool called list_metrics used to pull data from the semantic layer, and another one called get_job_run_error for diagnosing failures available as functions in the dbt MCP server. The dbt MCP server grounds those interactions in structured, dbt-native context, so agents are working from what your data actually means, not guessing from static documentation.
Agent skills: the instructions. dbt's agent skills are workflow instructions that give your agent proven, opinion guidance for common dbt tasks like writing tests, debugging failures, defining metrics, handling migrations. They load on demand and only when relevant. An agent skill gives the agent a set of clear instructions needed to complete a specific task. Skills also provide the agent with rules and guardrails: never do this; here are common pitfalls you may run into; here are things that you need to look out for.
How the semantic layer, MCP, and agent skills fit together: Each piece has a distinct role, and together they cover everything an agent needs to work effectively with your data. The semantic layer provides the context, MCP provides governed access, and agent skills provide the proven workflows agents need to query the data or to get the tools they need out of the MCP server.
dbt's AI tools in production to speed up data development
The best way to understand how these three pieces work together is to see them in action. One of our clients, a very large technology company, used them to feed structured data into a Slack channel where dbt errors are automatically sent. They hooked dbt's MCP server, along with Claude, into that error triage channel to look at those job failures and actually diagnose them.
The integration uses the get_job_failure function in the dbt MCP server, looks at the error, and then has the agent analyze what happened and why. By the time a developer actually gets to that error they're able to see a quick triage that was already done, along with some possible solutions.
This integration is not fully set up for self-healing just yet. There are definitely controls around the AI, and it doesn't get everything right all of the time, but it's a huge time save. Instead of having to go into the dbt platform and dig through the logs to find the specific problem, you have it all laid out there by your agent.
That same team is also working on a GitHub action: if somebody creates a model and doesn't include a semantic layer definition, the agent will try to create one and send it back to the developer with a note: here's what I created, add on to it to make your semantic layer. The goal is to encourage that hygiene of getting that context as a natural part of the workflow, rather than an afterthought.
And, notably, both of these are use cases that don't require a semantic layer at all.
Where to start: pilot small and smart
If you're ready to include AI in your data pipelines, the most important advice I can give is to do it in steps. Really hone in on one business unit that is willing to work with you on a pilot program for AI readiness, and focus on gathering semantics around the data for that small subset.
(Pilots within the data team itself, like the error triage example above, are a great place to start. They can be very useful, and they don't require a well-crafted semantic layer to work effectively. So there's no reason to wait!)
Gathering that semantic information, though, will really allow you to get your feet under you when it comes to building a semantic layer, and it will allow you to iterate very quickly. When you collaborate with one team in a pilot project, you're able to break things and learn from your mistakes before bringing it out to more business units.
So: start small, really focus in on what you're able to do (and what you reasonably can do), and then apply what you learned. Then you can use the momentum you gain by providing something great to that particular team or business unit to expand the semantic layer to more teams across your org.
Why semantic standards matter: Open Semantic Interchange
Once you're ready to build out your semantic layer it's important to understand that, right now, basically every data tool implements semantic definitions in its own proprietary format.. Power BI has one, Omni has one, Databricks has one, Snowflake has one, and of course we have one.
That fragmentation creates a portability problem: if your semantic layer definitions (metric names, calculations, business logic) are expressed in a format that's specific to one tool, you can't move them to another tool without rebuilding everything from scratch. So, for example, if you define "monthly recurring revenue" in dbt's Semantic Layer and then want to also expose that definition in Power BI or Snowflake, you'd have to redefine it natively in each system. Besides redundancy it also creates inconsistency risk and a lot of maintenance overhead.
This is why dbt, along with Snowflake, Databricks, and a large number of other major organizations in data, have joined an initiative called the Open Semantic Interchange. The v0.1 a vendor-neutral spec is already live and open source. It's an industry-wide specification that standardizes how we exchange semantic metadata across analytics, AI and BI platforms. The OSI spec serves as a common language for metrics, dimensions, and relationships, so metrics can be interpreted consistently across tools (e.g., Snowflake, Tableau, dbt) while minimizing vendor lock-in.
The dbt Semantic Layer complements the spec by making those definitions operational: you define and govern your metrics in the dbt Semantic Layer using MetricFlow, and OSI provides the interchange format to move those definitions across other tools like Snowflake and Tableau. Author once, use everywhere.
Think of it like the same reason we needed MCP in the first place: when there's no common standard, every tool reinvents the wheel and nothing moves cleanly between systems. A shared standard changes that.
It's been a big few months of shipping at dbt. We've got a lot to cover — from the dbt Developer Agent going into preview, to making the upgrade to the dbt Fusion engine self-serve, to new ways to lock down your account security, to quality-of-life improvements for practitioners who live in the IDE. Here's everything that's landed since January.
AI that works with your data, not around it
dbt gets an AI-native developer: the dbt Developer Agent (Preview)
General-purpose coding agents are now everywhere, ready to help anyone code. But the question we kept hearing from teams this year was some version of: can we get an agent that actually works like an analytics engineer? One that understands my whole dbt project? One that can read the graphs, knows the lineage, validates before it touches anything, and helps me build dbt models without breaking anything?
This is why we've built the dbt Developer Agent, which is now available in Preview for dbt platform customers with dbt Copilot enabled.
Simply describe the change you want to make — rename a model, add a metric, migrate a stored procedure, fix a failing build — and the agent reads your graph, understands what's upstream and downstream, and drafts the edits across every file that needs to move. SQL, YAML configs, tests, documentation: coordinated changes in one pass. That means less time context-switching between files, fewer broken builds, and data work that ships faster.
→ Read our full announcement blog to learn more
dbt Agent Skills - GA
Earlier this year we released dbt Agent Skills — an open-source repository of best practices that teach generalist coding agents how to think like an analytics engineer that actually understands how to work with dbt projects.
Skills are structured knowledge files that agents load on demand. They encode things like: when to preview data before writing tests, how to structure a semantic model, how to debug a job failure without chasing the wrong root cause. Check out our growing repository of skills by clicking below:
Securely connect dbt to your favorite AI tools (Beta)
The dbt MCP server now supports OAuth, so you can now connect OAuth-enabled AI tools — Claude, ChatGPT, Glean, and others — to dbt using your existing dbt login. No token management, no configuration hand-off to an admin. Your identity, properly permissioned and secure, in a few clicks.
Remote MCP Server: Admin API support + product docs tools
Two new sets of tools landed in the dbt Remote MCP Server. First, the MCP server now supports Admin API calls — which means AI assistants (Claude, Cursor, etc.) can help troubleshoot job errors directly, not just write queries. Second, the MCP server now includes search_product_docs and get_product_doc_pages tools that pull from docs.getdbt.com in real time, so you get answers grounded in the actual docs rather than training data.
Bring your own Anthropic key
dbt Copilot now supports BYOK (bring your own key) for Anthropic, so teams can power their AI workflows in the dbt platform using their own Anthropic API key — with the usage, cost, and data handling that comes with it.
BYOK is also available for OpenAI and Azure OpenAI, giving teams flexibility to build with the model provider that fits their security, compliance, and cost requirements.
Getting to Fusion just got a lot easier
The big headline on the Fusion side this cycle is that adoption is now self-serve in dbt platform. By accelerating your upgrade to Fusion, you can take advantage of 30x faster parsing time, richer metadata for AI, realtime feedback on SQL as you type, and more. But upgrading your projects manually one-by-one could take hours or days…why not let dbt do the hard parts for you?
Upgrade to Fusion project by project
If you're a dbt platform customer, you can now see which of your projects are eligible for Fusion and move them one at a time, directly from the platform UI. Pick a project, follow the prompts – no ticket, no wait, no overhead.
Fusion migration skill (Beta)
Upgrading to Fusion shouldn't mean fixing conformance errors manually. The Fusion migration skill in the dbt Developer Agent brings an automated approach to getting your projects Fusion-ready, faster:
- It classifies every conformance failure,
- Applies only validated high-confidence fixes automatically, and
- Walks you through medium-confidence changes with clear diffs and your approval.
Blocked issues– those caused by Fusion bugs or framework limitations– are surfaced immediately, with context and a path forward. No wasted effort chasing unfixable errors. The skill re-validates after every fix to handle cascading errors correctly and ends every session with a transparent report. This gives you faster triage, safer fixes, and trust in your upgrade.
How to get started:
- In dbt Studio: Find a job or project that's ineligible for Fusion. Attempt the Fusion run so you can see the build conformance errors. Studio will surface a new entry point directly in that conformance error experience so you don't have to dig through error logs. From there, launch the conformance skill and enjoy!
- Via VS Code: The dbt VS Code extension now makes Fusion setup and upgrade significantly easier. When you're ready to upgrade your project, you can run the CLI onboarding flow in the terminal or let an AI agent handle it via the dbt Agent Developer or Cursor, no command line required.
Start your seamless upgrade to the dbt Fusion engine:
More from Fusion this cycle:
Beyond easier adoption, we've invested in making the engine faster and more capable.
- UDF-aware deferral. When you run with --defer and --state, dbt now resolves function() calls from the state manifest — so models that depend on UDFs don't require you to rebuild those functions in your current target first.
- Python UDFs are now supported on Snowflake and BigQuery in the Fusion engine CLI.
- DuckDB support (Beta). Run local dbt projects without a warehouse account. Useful for testing, exploration, and CI scenarios where warehouse costs matter.
- Apache Spark 3.0 (Beta). Fusion engine CLI support for Spark means faster compilation and execution for Spark-based dbt projects – no Python runtime, no subprocess overhead.
For dbt platform customers:
- dbt compare from local dev to CI. You can now compare changes at every stage of your workflow. In local development, the dbt VS Code extension previews how your edits affect your data (added/removed rows, join verification) before you open a PR. Then at the CI stage, dbt compare runs in orchestration on Fusion, giving you model-level diffs as part of your pipeline gate automatically.
- Fusion release tracks give you control over your update cadence: Nightly, Stable, Extended, and Fallback. Choose the release track that matches your team's stability requirements, risk tolerance and change management processes.
- New projects default to Fusion Stable. New environments in Developer, Starter, and Enterprise accounts now provision on the "Fusion Stable" release track by default – for any supported adapter (Snowflake, Redshift, BigQuery, Databricks).
Want to fast-track your migration to Fusion? Use our quickstart guide.
–> Quickstart guide for Fusion
For dbt builders: Developer experience improvements
This cycle we focused on the things practitioners have been asking for: faster navigation in the IDE, more context at a glance, broader warehouse support for query history, and a meaningfully simplified semantic layer spec.
Studio IDE: search, replace, and command palette
The Studio IDE now has search and replace across your project, a command palette, and the ability to jump to symbols and run IDE configuration commands. These capabilities have been long-requested, and now they're here.
Studio IDE: Better status bar
The status bar now surfaces deferral settings, dbt version, and project status with quicker access to change them.
Model query history: Databricks and Redshift — Beta
Model query history now supports Databricks and Redshift in addition to Snowflake and BigQuery. If you're on either of those warehouses and want to understand query patterns at the model level, this is now available in beta.
New semantic layer YAML spec
The new semantic layer YAML specification introduces several key changes: semantic models are now embedded within model YAML entries (no more managing entries across multiple files), measures are now simple metrics, and frequently-used options are promoted to top-level keys. This is a meaningful spec simplification making it easier for anyone maintaining a semantic layer, and a lower barrier to adoption for those who haven't yet. The new specification is live in dbt Core v1.12 and on the dbt platform "Latest" release track.
→ Migrate to the latest YAML spec
Access to dbt that's secure, governed and self-serve
We shipped several updates this cycle to make security configuration simpler — and in most cases, self-serve.
Global login — GA
There's now a universal login URL that shows all the accounts you have access to across regions and tenancies, in one place. This is available now for multi-tenant accounts with an account-specific domain; single-tenant support is coming soon.
Self-serve private endpoints — Beta
You can now configure Snowflake PrivateLink endpoints directly in the dbt platform without filing a support ticket. Go to Account settings → Integrations → Private endpoints to request and manage Snowflake PrivateLink endpoints on AWS. If establishing secure connectivity for your dbt setup has been a multi-week support ticket process, that changes now.
Connection profiles — GA
Profiles let you define and manage connections, credentials, and attributes for deployment environments at the project level. dbt automatically creates profiles for your existing projects and environments, so there's nothing to migrate. Useful for teams that want more structured control over how credentials and connections are organized across environments.
Account-level Slack and Microsoft Teams notifications — GA
Job notifications can now be sent to Slack and Teams channels configured at the account level, not just per-job. This makes it easier to set up centralized alerting without touching every job's configuration. Both Slack and Teams notifications are now generally available.
→ Slack notifications · Teams notifications
dbt Core v1.12 is here in Beta
The dbt language is continuing to evolve, and dbt Core v1.12 reflects that momentum. The beta release includes contributions from across the community.
What's in v1.12:
- New on_error config to control whether downstream models run when an upstream model fails. Set on_error: continue on a model to allow downstream nodes to still attempt to execute even when it errors.
- Define project variables in root-level vars.yml to reference them within dbt_project.yml or to keep dbt_project.yml slim.
- New selector method (selector:my_selector) to reference a named selector from selectors.yml inside --select or --exclude to combine with other selectors, graph operators, and set operators.
- Support for the new semantic layer spec simplifies how you define metrics and dimensions by embedding semantic annotations directly alongside each model.
- Expansions of user-defined functions (UDFs)
- Use public third-party PyPI packages in your Python UDFs with the new packages config.
- Write UDF logic in javascript.
- Overloaded UDFs - define multiple functions with the same name but different argument signatures.
- Execute ad hoc database statements (no macro needed) with dbt run-operation --sql
- Improvements to exception handling so error messages are clearer and stack traces are easier to interpret.
- and more coming soon!
→ Learn more in the v1.12 upgrade guide
What's next
There's always more coming. Stay tuned on our blog for the latest announcements.
In the meantime, the features above are live. If you have questions, find us in #product-updates in the dbt Community Slack. Or contact us to see what dbt can do for your data team.
See us in San Francisco this June
We're at Snowflake Summit June 1-4 (Booth #2112) and Databricks Data+AI Summit June 15-18 (Booth #430). We'll have live demos, the team on site, and a lot to show you.
VS Code Extension
The free dbt VS Code extension is the best way to develop locally in dbt.
Coding agents are doing a tremendous amount of useful work today. Since Claude Code dropped last year, followed by Opus 4.5 and GPT 5.2, software engineering has very clearly passed a phase change. We've gone from copilot-style autocomplete to agents that can run end-to-end across the SDLC.
But anyone who's tried to point one of these agents at a dbt project has hit the same wall I have. Ask a coding agent to build a new dbt model, and it'll happily make five or six changes across your DAG, then try to run the new model at the end.
It breaks. The agent didn't know which columns existed, didn't iteratively run queries as it walked the DAG, and didn't think about lineage or contracts.
The technology is marvelous. The agents simply haven't been taught how to do data work yet. Without a governed foundation—your actual models, lineage, contracts, and metrics—a coding agent is working from guesswork. It can write SQL that looks right and still return numbers no one can verify.
That's what we're fixing with dbt agent skills.
Coding agents are generalist agents, including for data
We call them coding agents, but that framing undersells what they are. dbt agent skills can do all types of work. One of those types is data work, which is where most of you reading this probably want to put them.
The catch is that the agents have been specialized for coding workflows. There's a long list of small tweaks and improvements that make them slot neatly into a software engineering loop. Data has its own additional bits that haven't been baked in by default: understanding data lineage, respecting contracts, iteratively running queries to validate as you go, and knowing when to materialize what.
A lot of teams have been layering those in by hand with AGENTS.md files. But there's a ceiling to how big an AGENTS.md can get before it becomes its own problem.
What agent skills are, and what dbt's are doing
Agent skills are a protocol Anthropic released late last year and donated to an open foundation. They're how you give an agent context into specific processes and workflows it needs to know about, packaged as markdown plus optional supporting scripts that the agent loads when relevant.
We've taken everything dbt Labs has learned about analytics engineering and ported it into a series of agent skills, in an open repo that works with any agent supporting the protocol. You can think of agent skills as dbt best practices ported directly into your agent. That includes:
- Building a model iteratively rather than one-shotting it
- Writing unit tests
- Understanding your Directed Acyclic Graph (DAG)
- Building your semantic layer
- Debugging incremental models (which any longtime dbt user has spent their share of time on)
The goal is to have the entire Analytics Development Lifecycle (ADLC) captured within skills, so the combination of a generalist coding agent and the dbt skills gives you a powerful data agent out of the box.
That's part of the way there. The rest is your custom context, the things only your organization knows. Naming conventions, materialization choices, source quirks, gotchas. Skills are designed for that too: anyone can write one for their own project, and the strongest setups combine the general dbt skills with org-specific custom skills layered on top.
How Factory scaled up with dbt
Nikhil Harithas from Factory has been building this out from scratch. Factory is in the business of bringing autonomy to software engineering through Droids, their generalist agents that work across the SDLC, including the terminal, web, CLI, GitHub, and Teams. Nikhil's a field engineer there, and over the last few months, he's been standing up Factory's data posture using dbt as the central piece.
Factory's materialization strategy changed as the team scaled up. Factory is a younger company, so the default was to materialize as late as possible, with a blanket rule that worked fine in the early days.
As certain queries took long enough that things started getting expensive, and as the team wanted fresher data, that rule needed to change. The math, Nikhil notes, is both money and time, and also how many people you have to maintain the pipeline.
Up until a few weeks ago, Factory was rebuilding entire tables on every single run because it was fine. The shift to incremental builds came because that was the only way to run more often without blowing up cost. Incremental builds are finickier than entire builds, so testing started to matter a lot more.
As Nikhil puts it: "I wasn't as familiar with the dbt test suite until a couple of weeks ago, when I was like, okay, it's time to shore up all these kind of implicit contracts in these table definitions. We realized that every row has to have a distinct ID of some kind, depending on the table."
Tests went on the most important tables first. The principle behind it: "How can you increase individual leverage as far as you can by systematizing as much as you can, by giving Droid the same kind of feedback you or I would?"
Lessons learned from the build-out
Nikhil's top-line claim from the build-out: months of work that would have taken five or six people was done by one and a half people, in a couple of months. That alone is worth taking seriously.
A few of the patterns he ran into:
Build vs. operate are different motions, and you need to teach the agent both. Nikhil's framing: "What is it like to develop in dbt, and what is it like to operationalize in dbt?" Those things are related but slightly different. The historical reason agents have struggled with data teams more than they've struggled with generalist software engineering is, ultimately, a context problem on both fronts. Data is a mix of writing code and running operational workflows. It's closer to SRE work than pure software engineering in places.
Skill creep is real. Early on, Nikhil ran into trouble with too many skills, which led to inconsistent triggering and ambiguity about which skill applied. The fix: Be intentional about how many skills you have, and make each one denser. Skills you're confident the agent will discover on its own can stay as habit-forming background. The high-value skills are the ones that anchor behavior on the most important tasks.
Say it louder in AGENTS.md. Skills get auto-invoked sometimes, but you shouldn't bet on it. AGENTS.md is what's guaranteed to be in context, so Nikhil's pattern is to point at the relevant skills there explicitly: "You're going to be using BigQuery and dbt and a few other vendors that Extract, Transform, and Load (ETL) data to us. You have to pay attention to the skills of these particular frameworks, because there's going to be how you do anything at all."
Build the repo so agents bump into skills naturally. Even when a skill isn't auto-invoked, an agent grepping around to get its bearings should run into it. The mere mention of "dbt" anywhere in the project should surface the relevant SKILL.md in the search results.
Skill golf. Doug Bady at dbt coined this. The practice: go through your skills and try to rip out everything you can to make them as tight as possible. Distractions in context cost you.
Documentation as a hook. Factory now has a check that fires when someone changes a column or table. It requires documentation to land somewhere in the project before the change can merge. The agent doesn't have to write the docs by hand, but the structural requirement creates a flywheel where good behavior produces more good behavior.
Building faster feedback loops
"Look at the last 20 years of software engineering," Nikhil says.. "We've gotten faster because of feedback loops. It's easier to write tests, easier to write integration tests, logs are easier to look at. Humans are getting faster feedback loops. We have to give the same thing to agents."
Nikhil's framing of where coding agents are today is worth leaning on. We're at a point, he says, where "if you can imagine it and if you are determined to build it, you can build it." That's a pretty magical thing to be able to say.
But it feels like not everyone is experiencing that reality. The reason, in his view, is that while a lot of things that used to be difficult are now easy, some of the things that were hard are still hard. Integrations. Human context and assumed knowledge. Naming things, famously.
If you're going to put one thing on your list this week, Nikhil's call was to figure out how to give your agents access to the most tools you can and the most amount of direct feedback you can. Nothing is more powerful than read-only access to the database. If an agent can't query the database, everything goes slower.
The reframe he offered is the one I'd lead with: What would need to be true for you to give an agent access to your database? That's the question. Maybe the work this week isn't a data engineering task at all. Maybe it's setting up your environment so you'd feel safe handing an agent that access. Once they have it, they fly.
This is a singular moment in technological progression. The combination of generalist coding agents, dbt agent skills, the dbt MCP server, your own custom skills, and platforms like Droid is making it possible for very small teams to do work that used to require very large ones.
Get involved. The dbt agent skills repo is open, and we want to see what you build with it.
dbt-core · v1.11.9
dbt-core 1.11.9 - May 06, 2026
Fixes
- Fix
static_analysis: offbeing interpreted as booleanfalseinstead of string"off"in manifest.json (#12015) - Fix state:modified not detecting .yml property changes for resource_type:function (#12547)
Under the Hood
- Update jsonschemas for more accurate deprecation warnings: macro.config should not warn (#12670)
Dependencies
- Bump libpq-dev in Docker image from 13.23-0+deb11u1 to 13.23-0+deb11u2 to fix build failure due to superseded package version (#NA)
- Bump libpq-dev in Docker image from 13.23-0+deb11u2 to 13.23-0+deb11u3 to fix build failure due to superseded package version (#NA)
Contributors
- @Thrasi (#12547)
- @b-per (#12015)
- @michelleark (#12670)
- @tauhid621 (#NA, #NA)
dbt Developer Agent is now available in Preview—grounded in your dbt project so you ship faster without breaking downstream.
Meet Antigravity: Google's agentic IDE enters the dbt orbit
There's a new player in the Agentic IDE space, and it's coming in hot.
Enter Antigravity, Google's entrance into the world of AI-powered development environments. I spent some time with it this past weekend. Paired it with Gemini 3. Let's just say…I did not expect to be that impressed. The power jump compared to traditional IDE workflows (and some other popular agentic IDEs) is significant, especially when you bring it into the dbt universe.
Let's break down what it is, how it fits with dbt, and a few pro tips to get the most out of it.
What is Antigravity?
At its core, Antigravity is a fork of Visual Studio Code. That's great news because it means most of the extensions, tooling, and workflows you already love just work out of the box.
But Antigravity isn't "just VS Code with a new coat of paint." It's built for an agent-first experience. You're not just coding, you're collaborating with AI agents that can reason across your project, propose plans, and execute tasks.
Think less autocomplete.Think more "co-pilot who drank three espressos and read your entire repo."
Getting started with dbt in Antigravity
If you're working with dbt, step one is easy: install the official dbt extension.
You can read about it here: https://docs.getdbt.com/docs/about-dbt-extension
With the dbt extension installed, you immediately get:
- Column-level lineage
- Query preview
- Rich dbt-aware IDE features
- Improved navigation across models, sources, and tests
In other words, your IDE understands dbt instead of just politely pretending to.
From there, you can immediately start using the built-in agent to help generate SQL and YAML files. The agent scaffolds models, adds tests, and writes the YAML documentation you keep forgetting.
Turning it up: Add the dbt MCP server
If you really want to unlock Antigravity's potential for dbt work, give it access to dbt's structured context. The dbt MCP server is how you do that: it surfaces your project graph, model definitions, lineage, test results, and semantic layer to the agent in a governed, queryable way.
Inside the agent window:
- Click the three dots in the top-right.
- Select MCP servers.
- Add the dbt MCP server.
Eventually, the dbt MCP server will be available in the market. You can choose it from the dropdown menu, but for now, you can just change the mcp_config provided by Antigravity and it does the rest. Those configuration options can also be seen here: https://docs.getdbt.com/docs/dbt-ai/about-mcp
This significantly expands what your local agent can do.
And here's where things get interesting.
You're no longer just asking for code snippets. You're enabling deeper project-level awareness and workflows.
MCP servers that pair beautifully
Beyond dbt, there are several MCP servers that elevate the experience:
- GitHub
- Google BigQuery
- AlloyDB
- Dataplex
Now imagine this workflow:
- A ticket is opened.
- The agent reads it.
- It reviews data classification tags in Dataplex.
- It generates SQL, YAML, and tests.
- It writes a pull request in GitHub.
- CI kicks off dbt orchestration and validates everything.
That's not autocomplete. That's workflow acceleration.
We're talking about reducing friction across development, governance, and deployment in one unified environment.
Pro tips for working with Antigravity + dbt
After a few days of experimenting, here are some practical lessons.
1. Pair Antigravity with the Gemini CLI
Use Antigravity for:
- Multi-agent brainstorming
- Large architectural work
- Implementation planning
Use the Gemini CLI for:
- Focused terminal tasks
- Deep maintenance
- Headless execution
- Specific, scoped operations
Together, they create a powerful balance between high-level reasoning and low-level precision.
2. Define rules for your agent
Create global or workspace rules to guide how your agent behaves:
~/.gemini/GEMINI.md.agent/rule/
You can define:
- Naming conventions
- SQL style standards
- Testing requirements
- Documentation expectations
Think of it as training your agent to be a senior analytics engineer instead of an enthusiastic intern.
3. Add skills for specific tasks
You can extend your agent with specialized skills depending on what you're building. dbt-specific skills are a great place to start: https://docs.getdbt.com/blog/dbt-agent-skills
Skills help tailor the agent's behavior so it understands how to approach dbt models, testing strategies, documentation, and more.
4. Break large tasks into smaller ones
This one's critical. Antigravity is very good at:
- Creating implementation plans
- Designing step-by-step execution strategies
It is even better when you:
- Break requests into smaller, precise tasks
- Start at task 1
- Move sequentially
With AI agents, smaller and more specific is almost always better. Think iterative, not monolithic.
Final thoughts
Google entering the agentic IDE space with Antigravity feels like a meaningful shift especially for analytics engineers living in the dbt ecosystem.
Because it's built on Visual Studio Code, adoption is frictionless.Because it supports MCP servers, it's extensible.Because it integrates deeply with Gemini, it's powerful.
And because it can help you write SQL, YAML, tests, PRs, and documentation…it might just give you your weekends back.
No promises. But it's a strong start.
last updated on May 06, 2026
This will be a longer blog.There will be some code if you want to follow along otherwise feel free to skim to see a glimpse into a simple guide for AI. Technical requirements: Python, Git, dbt Core/dbt Fusion, or the dbt platform.
TL;DR: This post is about exploring what happens when you plug AI into a dbt project and let it do things. By experimenting with Gemini, Google Agent Development Kit, the dbt MCP server, and the dbt Fusion engine, with dbt's structured context as the foundation, I built a working agent just to see how far it could go as a starter project. It's less "here's a perfect solution" and more "let's see what's possible." And that's what made it fun.
I started my career as a software engineer. The kind who lived in an editor all day, shipping code, breaking things, and fixing them again. Over time, my day job shifted. I was still close to the code, but no longer in it the way I used to be as I switched to different roles. Then AI tools showed up and quietly changed the rules of the game.
Suddenly, working with code felt…fun again.
I stopped asking, "Do I still remember how to build this?" The question became, "What can I build now?" At the same time, I switched to being more in the data world and started thinking about how I could bring that same energy here. Then dbt Labs released the dbt Fusion engine and I saw a true path forward to something exciting. With dbt's rock-solid foundation for analytics engineering and Google's AI tooling opening the door to agentic workflows, I decided to explore what it looks like to pair the two hands-on.
The result of that exploration is a working dbt agent, powered by Google's ADK framework. It's not meant to be magic or perfect. It's meant to be practical: a starting point for solving common dbt problems, poking at what's available out of the box, and experimenting with what happens when you give dbt a little bit of autonomy.
This post is a walkthrough of that journey. What I built, why I built it, and how AI changed the way I think about getting started with dbt again.
Defining terms
Before jumping into the fun stuff, here are a few terms worth knowing.
LLM (large language model): An LLM is the "brain" behind modern AI tools. It's what reads, writes, and reasons about text (and increasingly, code and data). It's like a very fast reader who has seen a lot of books and code. You ask it a question, and it predicts the best next words to respond with often in surprisingly useful ways.
MCP (model context protocol): MCP is a standard way for AI models to safely interact with tools, systems, and data without hard-coding custom integrations everywhere. Think of MCP like a universal remote for AI. Instead of teaching the AI how to use every tool differently, MCP gives it a consistent set of buttons and rules so it doesn't accidentally do something wild.
Agent: An agent is an AI system that can reason, decide what to do next, and take actions using tools rather than just answering questions. A normal AI answers questions.An agent gets a goal, figures out steps, uses tools, and checks its own work. Think of it as a very junior, but very fast, teammate.
dbt MCP: The dbt MCP server exposes dbt capabilities like metadata, models, tests, and commands as tools an AI agent can safely use. Instead of an AI guessing how dbt works, dbt MCP gives it a rulebook and a toolbox. The agent can ask things like "What models exist?" or "Run this dbt command" without breaking anything.
Gemini: Gemini is Google's family of AI models, designed to handle reasoning, code, and multi-step problem solving at scale**.** Gemini is the brain I'm plugging into this system. It's the part doing the thinking, reading dbt projects, understanding context, and deciding what to try next.
Google ADK (agent development kit): Google ADK is a framework for building AI agents defining how they think, what tools they can use, and how they interact with systems. If the agent is the worker and Gemini is the brain, ADK is the job description. It defines what the agent is allowed to do, how it calls tools, and how everything stays organized and safe.
Why this matters for dbt
All of these pieces: LLMs, agents, MCP, and Google's ADK matter to dbt because they finally let AI move from suggesting things to safely doing things in analytics engineering.
What really unlocked my excitement here was the dbt Fusion engine. Real-time parsing and a smart, deterministic compiler mean AI no longer has to "hope" its SQL or YAML is correct. Every output can be validated immediately against the warehouse, the project graph, and dbt's rules. That's a huge shift.
Instead of treating AI like a clever autocomplete, Fusion makes it possible to treat AI like a junior analytics engineer:
- It can propose models, tests, and metrics
- Fusion can instantly tell us whether they compile, parse, and conform
- Mistakes become feedback loops, not production risks
When you combine:
- Agents that can reason and act
- MCP that enforces safe, intentional tool use
- Gemini for multi-step reasoning
- Google ADK to orchestrate everything
- dbt Fusion as the guardrails and truth source
You get something genuinely new: an AI workflow that can iterate on data logic in real time, with confidence. For someone who hasn't been data engineering code-heavy ever, this felt like having a safety net that made building fun and that's what pushed me to see how far this could go.
Part 1: A beginning
Like most beginnings, let's start small. We are going to build an enterprise data scientist agent that would make Skynet jealous. Just kidding, we will get the dbt MCP to work.
I won't do an in-depth guide here but will provide some links:
Introducing the dbt MCP Server: This is a fantastic blog by Jason Ganz explaining a few of the key operating principles of dbt's MCP.
There are two important takeaways from that blog:
- dbt MCP can be deployed locally or connected remotely
- It grants us access to most dbt functionality through tools
There is a quickstart guide located here if you want to try it out: https://docs.getdbt.com/docs/dbt-ai/setup-local-mcp
Lastly, there is the repo itself, where you can find the tools diagram and even some agent examples that have been put together in the examples folder. If you want this to run locally, clone this repo: https://github.com/dbt-labs/dbt-mcp
Once your python environment is set up, requirements installed, and environment variables configured, you should be good to go.
You can then use Claude, Cursor, Antigravity or other clients to connect and confirm it's working. Here is what success looks like with Claude. If you have any problems up to this point, please revisit the MCP quickstart guide:
Now if I prompt Claude to list my tools, it would provide a list across a few categories:
This is an important milestone. These tools let you explore your dbt project, query metrics, analyze lineage, monitor job runs, and troubleshoot issues. Most people never have to go any further.
But for those who do want to go deeper: Try asking questions on your data, do some codegen in cursor/antigravity, translate syntax, break up stored procedures…it's limited only by your creativity.
Part 2: Let's get agentic
Alright, let's add Google's Agent Development Kit (ADK) to the mix.
If you want to go deep, Google's official documentation is available here: https://docs.google.com/document/d/1yYaRUUJddrY5PZIJHLZD1a54PHvqCswRJu8TGXnjc%5F8/edit?tab=t.0
It goes far beyond what we'll cover here and dives into the full breadth of functionality ADK brings to the table.
At a high level, ADK provides:
- Orchestration for adaptive agent behavior
- Multi-agent architecture support
- A rich, extensible tool ecosystem
In short, it gives you the structure and tooling needed to build serious agent-based systems, and importantly, to take something from prototype to production.
We won't be publishing any agents today, but ADK absolutely provides the scaffolding to do exactly that when you're ready.
I chose Python as my language of choice and started with the official quickstart guide: https://google.github.io/adk-docs/get-started/python/
That gives you a clean project structure right out of the gate:
my_agent/
│── agent.py # main agent code
│── .env # API keys or project IDs
│── __init__.py
From there, you've got two easy ways to run your agent:
Command-line access:
adk run my_agent
Or… the much nicer option, the web interface:
adk web --port 8000
That spins up a beautiful local test interface where you can interact with your agent in real time. It's fast, clean, and makes experimentation far more enjoyable than staring at raw terminal output.
Part 3: Customize the agent
If you're looking for the simplest possible agent experience, start with the example in the dbt Labs dbt-mcp repository:
https://github.com/dbt-labs/dbt-mcp/blob/main/examples/google%5Fadk%5Fagent/main.py
That example does a great job of keeping things simple. It:
- Reads your .env file to locate your dbt MCP endpoint
- Registers the available MCP tools
- Exposes those tools through an agent interface
In just a few lines of code, you have an agent that can talk directly to your dbt environment. Clean. Practical. Effective.
Part 4: Extend
Of course, we couldn't stop there. I was just getting started.
I created an extended example here:
https://github.com/StephenR-DBT/dbt-gemini-agent-starter
The goal was to push beyond a single-tool agent and explore orchestration across multiple tools and subagents. Specifically, I built three components:
1. dbt_compile
A local dbt compilation tool with detailed JSON log analysis.
This runs dbt compile (using the Fusion engine) to validate SQL before anything ever hits the warehouse. That means:
- Syntax validation
- Model resolution checks
- Dependency validation
- Structured log parsing for intelligent feedback
In practice, this allows an agent to generate SQL, validate it locally, detect issues, and automatically iterate before shipping anything downstream. It's like giving your agent a pre-flight checklist.
2. dbt_mcp_toolset
Cloud-based dbt platform operations via MCP.
This exposes the full dbt MCP toolset to the agent, including access to:
- Project metadata
- Model definitions
- Lineage information
- Intelligent querying capabilities
Instead of guessing about the structure of a project, the agent can inspect it directly. It can reason over metadata the same way an analytics engineer would.
3. dbt_model_analyzer
A specialized subagent focused on data modeling analysis.
This was the fun part.
Rather than giving one monolithic agent every responsibility, I created a purpose-built subagent that focuses purely on modeling logic: structure, best practices, and design patterns. It's narrower in scope and offers a fun perspective.
Why this matters
What this experiment showed me is that agents don't just "generate SQL."
They can:
- Validate their own work before execution
- Use metadata to reason about the broader system
- Delegate to specialized subagents
- Iteratively fix issues they create
In other words, they can participate meaningfully in the development lifecycle and not just at the prompt layer, but at the systems layer.
And most importantly, it demonstrated something creative and genuinely new: a development loop where AI doesn't just produce code, but critiques, validates, and improves it using the same tooling we rely on as engineers.
That's where this starts to feel less like a demo… and more like the beginning of a new workflow.
Please reach out to me if you create anything fun. I'll have demos, blogs, and things going forward, and I would love to see what the community creates with the dbt MCP server.
VS Code Extension
The free dbt VS Code extension is the best way to develop locally in dbt.
Tableau and dbt: structured context for reliable AI analytics
last updated on May 06, 2026
We spend plenty of time talking about the dbt MCP server. And for good reason. It's practical, reliable, and genuinely useful when you're building analytics workflows around dbt.
But there's another player that deserves equal airtime: the MCP for Tableau.
So let's fix that.
Why pair dbt and Tableau?
dbt handles transformation logic and ensures every metric definition is versioned, tested, and governed. Tableau handles visualization and distribution. One shapes the data; the other tells the story.
Individually, their MCPs are powerful. Together, they're streamlined.
When both are wired into your agentic environment, you're no longer bouncing between tooling contexts. You can:
- Inspect and adjust dbt models
- Validate exposures and lineage
- Explore Tableau metadata
- Align dashboards with transformed models
- Iterate with a single conversational thread
No context loss.
The setup (it's easy)
This isn't a 17-step integration guide. It's simple.
Using your preferred agentic IDE or client whether that's Cursor, Claude, Antigravity, or another MCP-capable tool add both the dbt MCP configuration and the Tableau MCP configuration to the required config file.
That's it.
Save the file. Restart if needed. Wait for the green arrows.
Once both MCPs are live, they're available to any prompt you run in that environment. No special invocation rituals. No manual switching. Just ask.
Let's explore a few ideas
1. Impact analysis dashboard
Touching a critical model like fct_revenue has downstream consequences that are easy to miss.
- Trace lineage of the model in dbt.
- Pull performance metrics from the dbt Semantic Layer.
- Search Tableau for every dashboard/workbook using that model.
- Automatically generate an impact report showing: downstream dependencies, dashboard usage stats, and potential stakeholders impacted.
You go from "Uh oh, did I break something?" to "Here's exactly what I need to check" in minutes.
2. Data quality health monitor
Keep your analytics trustworthy with a unified view:
- Check dbt model health (tests, source freshness).
- Pull trend metrics from the dbt Semantic Layer.
- Snap in Tableau views for the same metrics.
- Generate a health report with recommendations, like how often each dataset should be refreshed.
Think of it as a fitness tracker for your data pipelines. Green arrows = data is in shape; red = time for a pit stop.
3. Metric reconciliation detective
The classic "why don't the numbers match?" problem? Solved.
- Query a metric through dbt Semantic Layer (e.g., monthly revenue).
- Query the same metric via Tableau's published data source.
- Retrieve the compiled SQL from both systems.
- Compare and highlight discrepancies automatically.
Your CFO will finally stop asking why the dashboard number differs from the report. Mystery solved. This works because the dbt Semantic Layer is the one place where 'monthly revenue' has a single definition.
4. Self-service analytics enablement
Empower your team without endless hand-holding:
- User asks: "What revenue metrics can I analyze by region?"
- List available metrics from dbt Semantic Layer.
- Search Tableau for existing dashboards with those metrics.
- Show screenshots if dashboards exist; otherwise, query dbt and suggest creating a new viz.
It's like having an analytics concierge always ready to point people to the right metric or dashboard.
5. Performance optimization finder
Stop slow queries before they slow you down:
- Get dbt model performance metrics (execution time trends).
- Identify Tableau dashboards querying those slow models.
- Analyze Tableau query patterns and data retrieval efficiency.
- Recommend optimizations, e.g., "This dbt model takes 10 min but Tableau only uses 3 columns trim it down."
It's the intersection of observability, efficiency, and a tiny bit of magic.
The takeaway
Individually, dbt and Tableau MCPs are powerful. Together, they turn what used to be multi-step, context-switch-heavy tasks into single-threaded, agent-powered workflows. One config file, two green arrows, endless possibilities.
VS Code Extension
The free dbt VS Code extension is the best way to develop locally in dbt.
dbt-core · v1.11.8
dbt-core 1.11.8 - April 08, 2026
Fixes
- Add @requires.catalogs decorator to compile command to fix REST Catalog-Linked database compilation (#12353)
- Improve logic for detecting config with missing plus prefix in dbt_project.yml (#12371)
- Add config and allow meta and docs to exist under it for macros (#12383, #9447)
DBT_ENGINEprefixed env vars picked up by CLI (#12583)- Allow deferral for UDFs (#12080)
- Resolve full node description while running udfs for better logging (#12600)
- Raise PropertyMovedToConfigDeprecation instead of MissingArgumentsPropertyInGenericTestDeprecation (#12572)
- Add @requires.catalogs decorator to test command to fix custom catalog integration support (#12662)
- Raise custom key in config deprecation warning for invalid config keys in dbt_project.yml (#12542)
- Ensure MAX_GROUPING_TOKENS and MAX_GROUPING_DEPTH default to None independently (#12694)
- Ensure property depr checks check for aliases with plus prefix (#12327)
Docs
- Enable display of unit tests (dbt-docs/#501)
- Unit tests not rendering (dbt-docs/#506)
- Add support for Saved Query node (dbt-docs/#486)
- Fix npm security vulnerabilities as of June 2024 (dbt-docs/#513)
- Bump form-data from 3.0.1 to 3.0.4 (dbt-docs/#554)
- Add support for UDF (function) resource type in lineage graph (dbt-docs/#574)
Under the Hood
- Unpin sqlparse dependency, and introduce --sqlparse CLI option for configuring sqlparse limits (#12329)
Contributors
Learn how to operationalize your analytics agents by building context for LLM models with dbt and MCP servers.
Learn how to connect dbt, when to migrate, and what the tradeoffs are for your data team.
The root cause of AI hallucinations in data contexts
AI hallucinations in data analytics typically stem from three interconnected problems.
First, ambiguous or undefined metrics create confusion. When business terms lack clear, standardized definitions, AI systems must make assumptions about what users are asking for. A column labeled "RevAdj_2023" might mean different things to different teams, and without clear metadata or context, an AI system cannot reliably interpret it.
Second, inconsistent data definitions across teams compound the problem. One department might define "revenue" as gross sales; another subtracts discounts and returns. When an AI system queries data from multiple sources with conflicting definitions, it produces outputs that appear authoritative but are fundamentally unreliable.
Third, ungoverned data access lets AI systems query raw, unvalidated tables directly. Without guardrails, these systems might pull from outdated sources, apply incorrect business logic, or combine incompatible datasets — all while presenting results that look perfectly legitimate to end users.
How a semantic layer addresses these challenges
A semantic layer provides a centralized framework that defines key metrics and business logic, embedding the metadata and context AI systems need to function reliably. Rather than letting AI query raw database tables directly, a semantic layer acts as an intermediary that enforces consistency and accuracy.
When properly implemented, a semantic layer transforms how AI systems interact with data. Instead of making assumptions about undefined terms, the system queries only pre-approved, governed metrics. If a user asks about "total adjusted revenue" and no such metric exists, the semantic layer flags the query as invalid and suggests valid alternatives — such as "revenue adjusted for discounts and returns" or "revenue of active accounts."
This creates a hub-and-spoke architecture. Metrics are defined once in a central location (the hub) and queried by any number of downstream systems (the spokes) — whether BI tools, embedded applications, or AI interfaces. Every endpoint accesses the same centralized definitions, ensuring consistency across the organization.
Consistency for reliable insights
A semantic layer eliminates the metric inconsistencies that cause AI systems to produce unreliable outputs. By aligning all metrics to single, standardized definitions, organizations ensure that AI systems always query trustworthy data. When the finance team and marketing team both ask about last quarter's revenue, they receive identical answers — because both pull from the same governed metric definition, regardless of which tool or interface they use.
This consistency extends beyond simple calculations. A semantic layer captures the complete business logic behind each metric: how it should be aggregated, what dimensions it can be sliced by, and what relationships exist between different data entities. This context allows AI systems to understand not just what data exists, but how it should be used.
Governance to protect and standardize
Effective governance is critical for AI success. A semantic layer enforces governance by restricting access to sensitive metrics, tracking changes to definitions with clear audit trails, and preventing unauthorized data access. Teams can be scoped to only the metrics relevant to their function — preventing a customer-facing AI agent from inadvertently exposing sensitive internal data, for example.
Governance also ensures consistency when business definitions change. Imagine the executive team updates the definition of "adjusted revenue" to include a new discount category. Without a semantic layer, that change requires manual updates across every BI tool, dashboard, and AI system that references that metric — a tedious, error-prone process that inevitably creates inconsistencies. With a semantic layer, the definition is updated once centrally, and all connected systems automatically use the new logic. AI interfaces, LLMs, and human users always work with the latest approved definitions.
Context for smarter decision-making
AI systems need more than data — they need context. A semantic layer provides this by embedding metadata and explicitly defining relationships between data elements. It links tables together (connecting "Customer ID" in a customers table to transactions, for example) so AI systems understand how purchases relate to customers or revenue to products. Defining these relationships explicitly means joins between tables are always performed correctly.
The semantic layer also standardizes business logic, embedding rules like "revenue = price − discounts − returns" to prevent mismatched definitions. Each metric includes comprehensive metadata: a clear name, a description of what it measures, the calculation logic, and guidelines for appropriate usage. This eliminates the ambiguity that leads to AI hallucinations.
Real-world impact on AI accuracy
The difference between AI systems with and without a semantic layer is substantial. When an AI system has access to well-defined metrics, clear business logic, and proper context about data relationships, it can provide reliable answers to complex questions — rather than interpolating from ambiguous source data.
Consider the earlier example of a retail company. With a semantic layer in place, when someone asks about "adjusted revenue for Product X," the AI system doesn't guess. It recognizes that multiple valid metrics exist and prompts the user to clarify: "Did you mean revenue adjusted for discounts and returns, or revenue adjusted for currency fluctuations?" This guided approach ensures users get accurate answers while building confidence in the AI system.
Speed and scalability for AI adoption
Beyond accuracy, a semantic layer accelerates AI adoption by improving query performance and enabling reuse. Through smart caching and precomputed metrics, AI systems deliver results faster — pulling from validated metric stores rather than scanning raw tables for every query. When AI systems are slow, users abandon them and revert to manual processes or ad hoc data team requests.
The semantic layer also streamlines scaling by letting teams reuse standardized metrics across projects. Instead of rebuilding logic for every new AI initiative, teams leverage existing governed definitions. As AI adoption grows, quality and consistency don't degrade.
Building AI on the right foundation
The potential of AI to transform how organizations work with data is real. But that potential is only realized when AI systems are built on a foundation of consistent, governed, well-contextualized data.
A semantic layer isn't optional for AI projects — it's a prerequisite. It provides the guardrails that prevent hallucinations, the consistency that builds trust, and the context that enables sophisticated analysis. For data engineering leaders evaluating AI initiatives, the question isn't whether to implement a semantic layer — it's how quickly.
Organizations using dbt already have a significant advantage. The dbt Semantic Layer translates dbt models into well-defined business metrics, creating a foundation for clean, reliable, AI-ready data. It integrates seamlessly with existing dbt workflows, ensuring data is accurate, governed, and aligned with business goals. By defining semantic models alongside data transformations, teams create a single source of truth that serves both human analysts and AI systems.
Before your next AI initiative, verify that your data is ready: Are metric definitions clear and standardized? Is data access governed? Is business logic codified and version-controlled? If any of those are uncertain, a semantic layer is where to start.
Get started with dbt for free and build the governed data foundation your AI strategy needs, or talk to our team about implementing the dbt Semantic Layer at scale.
FAQs
How does a semantic layer address AI hallucinations in data analytics?
What governance capabilities does a semantic layer provide for AI systems?
Why does a semantic layer improve AI system performance and scalability?
Anders Swanson explains what data teams can realistically expect when attempting to run on top of Iceberg in production.
Metadata management improves discovery, governance, performance, and trust in modern data systems.
ETL consolidates fragmented data, enforces quality, and satisfies compliance requirements modern organizations depend on.
NBIM cut runtimes 30–40% in 3 months with the dbt Fusion engine and State-Aware Orchestration—without heavy optimization.