# From Schema as Code to Schema as Context

> AI agents need more than DDL. The full database lifecycle codified (schema, dependencies, ownership, history) gives LLMs the context to query correctly.

Tianzhou | 2026-05-08 | Source: https://www.bytebase.com/blog/schema-as-code-to-schema-as-context/

---

For over two decades, the industry has been telling teams to treat database schemas like application code: version it, review it, ship it through a pipeline. That advice still holds. But a new consumer has shown up at the table, and it doesn't read pull requests the way your DBA does.

LLMs and AI agents are increasingly the ones [generating SQL](https://spider2-sql.github.io/), proposing migrations, and executing data changes. For these actors, a versioned migration file is nowhere near enough. They need **context**.

## Schema as Code Was a Necessary First Step

The [Database-as-Code](https://martinfowler.com/articles/evodb.html) movement (let's set aside the migration-based vs. state-based debate for now) brought real discipline to schema management. Tools like Liquibase and Flyway gave teams version-controlled migrations applied through CI/CD pipelines and a single source of truth, the same way Infrastructure as Code did. Bytebase extended this with review workflows, access control, and governance.

Here's the catch. A `CREATE TABLE` statement tells you the shape of the data. It tells you nothing about who is allowed to see or modify that data, which columns hold PII or financial or health records, what masking rules kick in when different roles query it, or what approval a change must clear before it touches production.

For a human developer, that knowledge lives in tribal memory and old Slack threads. For an AI agent, if it isn't codified, it doesn't exist.

## Agents Need More than DDL

Point a text-to-SQL agent at a production database and the typical setup feeds it a schema dump (table names, column types, maybe a few comments) and asks it to generate queries.

This works for demos. It falls apart in production.

The agent doesn't know that `hr.employees.salary` is Confidential and should be masked. It doesn't know that `customer.ssn` needs a compliance-mandated masking algorithm. It has no idea that querying `payments` requires just-in-time access approval that expires in one hour.

The real risk isn't that the agent writes a bad query. It's that the agent writes a **correct** query that should never have run in the first place.

## Context Is Everything Surrounding the Schema

The schema (tables, columns, indexes, constraints) is the structural skeleton. Context is everything that gives it meaning and keeps it safe:

- **Data Classification.** Every column tagged with its sensitivity level (public, internal, confidential, or restricted) as structured, machine-readable metadata that drives downstream policies.

- **Dynamic Data Masking.** Full masking, partial masking, or custom algorithms applied at query time based on who, or what, is asking. This is the enforcement boundary that stops sensitive data from leaking into an LLM's context window.

- **Access Control.** Fine-grained, role-based access beyond database-native GRANTs. Project-level scoping, environment-level restrictions, and just-in-time access with automatic expiration.

- **SQL Review Policies.** 200+ lint rules covering anti-pattern detection, naming conventions, and performance guardrails, serving as the automated reviewer when an agent generates or proposes SQL.

- **Change Workflows.** The process itself, codified: which changes need DBA approval, which environments need staged rollout, and what the rollback plan is. These workflows are what keep autonomous systems from making irreversible mistakes.

Audit trails (every query, every change, every access request) aren't something you codify. They fall out at runtime as a byproduct of enforcing the policies above, giving you accountability over what your agents actually did.

## Codifying Context: Terraform, API, and Your Own Format

You can't hand an LLM a PDF of your security policies and expect it to comply. You need machine-readable, version-controlled policy definitions enforced programmatically.

Bytebase manages all the context layers above and exposes them through its [Terraform Provider](https://docs.bytebase.com/integrations/terraform/overview) and [API](https://docs.bytebase.com/integrations/api/overview), so the same CI/CD pipeline that provisions a database also provisions who can access it, what masking rules apply, and what review process governs changes. For teams that want to go further, the API lets you build your own context layer in whatever format your agents consume best: pull classification taxonomies as JSON, export masking policies as structured data, or serialize the whole thing as YAML, TOML, or Markdown.

When everything is codified, policies are either enforced at the platform level (masking at query time, access denied before the query runs) or available as structured metadata the agent reasons about before it acts.

Schema as Code got us version control. Schema as Context is what gets us AI-readiness. The schema dump told the model what your data looks like. The question now is whether you'll also tell it what your data is allowed to do.

## References

- [Evolutionary Database Design](https://martinfowler.com/articles/evodb.html) - Martin Fowler and Pramod Sadalage's foundational article on treating database changes as evolutionary, version-controlled migrations.
- [Spider 2.0](https://spider2-sql.github.io/) - A benchmark for evaluating LLMs on enterprise-level text-to-SQL tasks across real-world databases.

## The rest of the series

- [What is a Database MCP Server?](/blog/what-is-a-database-mcp-server) - the foundation: what a database MCP server is, and why a raw one is fine locally but bites in production.
- [Governed MCP vs. Raw MCP](/blog/governed-mcp-vs-raw-mcp) - the MCP server is the one place agent database access is governed or left wide open.
- [How to Govern AI Agent Database Access](/blog/how-to-govern-ai-agent-access-to-enterprise-data) - the read path: identity, authorization, masking, and audit for an agent that queries your data.
- [When the Agent Writes the Migration, Who Approves It?](/blog/governing-agent-authored-database-changes) - the write path: governing schema and data changes an agent authors.