Text-to-SQL has eaten the entire conversation around AI and databases. And fair enough: getting an agent to turn natural language into correct SQL is a genuinely hard problem. But it's only half the story, and to be honest, it's the half everyone is comfortable talking about.
Suppose that problem is solved. Your agent has a well-curated semantic layer and consistently writes accurate SQL. Here's the question nobody's asking: can that agent only see the data it's supposed to see?
A customer support agent gets asked "Show me billing details for user #123." It generates correct SQL, joining users, billing, and payments, and the result comes back with the user's unmasked SSN from the users table. The query was perfect. The data leak happened anyway.
Generating correct SQL and governing data access are two different problems, and people keep conflating them:
| Problem 1: Accuracy | Problem 2: Governance | |
|---|---|---|
| Question | Is the SQL correct? | Can the agent see this data? |
| Domain | AI / Text-to-SQL | Security / Access Control |
| Failure mode | Wrong query results | Data leak with correct results |
| Example | SELECT * FROM uesrs (typo) | Agent returns unmasked SSN from users table |
Put simply: without governance, correct SQL becomes a liability. The better your text-to-SQL gets, the more exposed you are.
The Fundamentals Don't Change
Here's my first claim, and it's the easy one: an agent is just another principal hitting your database. It needs the same controls as any human who runs a query.
-
Access Control. Least privilege, scoped to only the databases, schemas, and tables the agent actually needs. A sales analytics agent has no business touching
hr.payroll. -
Data Masking. Sensitive columns (SSNs, credit cards, health records) should be masked before the results ever reach the agent. If a human analyst sees
***-**-1234, so should the agent. No exceptions because it's a robot. -
Audit Logging. Every query recorded: what ran, when, by which agent, on behalf of which user. When something goes wrong, and it will, the audit trail is the only thing that lets you trace it back.
So the principles are old news. What's new is the blast radius. A misconfigured agent can exfiltrate more data in seconds than a human could in an afternoon, and it won't get tired or feel guilty about it.
What's Actually Different About Agents
Same principles, different operational model. This is where the interesting part lives.
Agents Are Ephemeral
An agent spins up, does one task, and vanishes, often inside a few seconds. Provisioning a long-lived database account and leaving it sitting there for months is the wrong shape entirely. Agents want credentials scoped to a single task and revoked the moment that task is done.
Each Agent Needs Its Own Identity
The moment you let multiple agents share one database user, you've thrown away your visibility. Who ran that query? Some agent. Which one? No idea. Each agent type should carry a distinct identity with its own access policy, otherwise access control and audit logging are just theater.
Just-in-Time Stops Being a Feature and Becomes the Default
For human users, just-in-time (JIT) database access is a best practice you have to argue people into. For agents it's simply the natural shape of the work: spin up, query, return results, done. There's no reason to hold access before the task or after it. With agents, JIT goes from a security upgrade you sell to the default architecture nobody has to think about.
How Bytebase Handles It
At Bytebase, we treat the agent as a first-class principal, not a special case bolted on. One unified governance layer sits in front of PostgreSQL, MySQL, SQL Server, Oracle, Snowflake, BigQuery, and the rest.
The governance machinery is the same machinery your humans already go through:
- Fine-grained access control at the database, schema, and table level
- Dynamic Data Masking applied at the column level
- Just-in-time access that grants and revokes permissions per task
- Audit logging with full agent and user attribution
And agent identity is built in, not improvised. Bytebase supports service accounts and workload identities, so each agent runs under its own access policy. Wire it up through the Bytebase MCP server, or hit the API directly to build your own agentic workflows with governance already baked in.
The race to ship text-to-SQL is mostly won by whoever generates the cleanest query. The race that actually matters is the one nobody's running yet: when your agent generates that perfect query, do you know exactly what it's allowed to see?
The rest of the series
- What is a Database MCP Server? - the foundation: what a database MCP server is, and why a raw one is fine locally but bites in production.
- Governed MCP vs. Raw MCP - the MCP server is the one place agent database access is governed or left wide open.
- From Schema as Code to Schema as Context - giving an agent the classification, ownership, and policy it needs to act correctly.
- When the Agent Writes the Migration, Who Approves It? - the write path: governing schema and data changes an agent authors.