DATA

Text-to-SQL for Enterprise Data: Beyond SELECT * FROM

NOV 03, 2025

15 MIN READ

298 Likes

Every Text-to-SQL demo shows the same thing: a simple natural language question gets translated to a clean SELECT statement with one or two joins, and the result is correct. The demo is not wrong — modern LLMs genuinely can translate natural language to SQL with impressive accuracy on simple queries. The demo is incomplete. It doesn't show what happens when a user asks about a concept that's spread across six tables, or when the schema has column names that contradict their semantic meaning, or when the query needs to enforce a business rule that's implicit in organizational context but absent from the schema definition.

Schema Context is Everything

The quality of Text-to-SQL output is almost entirely determined by the quality of schema context provided to the model. A schema dump of table names and column types is the minimum viable context and produces minimum viable results. What models actually need is: table descriptions explaining what each table represents and how it's used, column descriptions explaining what each column means (not just its type), foreign key relationships and their semantic meaning, common query patterns and their correct SQL formulations, and business rules that constrain valid queries.

In practice, this means creating a schema metadata layer alongside the actual database schema. Every table and column gets a human-readable description, maintained as part of the database migration process. The context provided to the LLM for query generation is assembled from this metadata layer, not from the raw INFORMATION_SCHEMA.

Query Validation Before Execution

Text-to-SQL produces SQL that must be executed against a live database. Executing LLM-generated SQL without validation is not acceptable in production — not because LLMs frequently generate dangerous queries, but because even rare errors have real consequences on production data.

Agentica's Text-to-SQL pipeline includes a validation layer that runs before any query is executed. This layer checks syntax (parsing the SQL to catch obvious errors), semantic validity (verifying that all referenced tables and columns exist), query classification (ensuring the query is a SELECT — write queries require explicit tool selection and separate authorization), and result size estimation (preventing runaway queries that would return millions of rows).

Queries that fail validation are returned to the LLM with a structured error message explaining what was wrong. The LLM typically corrects simple errors on the first retry. Queries that fail validation three times are escalated to a different resolution path — either asking the user for clarification or falling back to a more conservative query formulation.

Ambiguity Resolution

Many natural language questions are underspecified relative to the database schema. "Show me revenue by region" requires knowing which table contains revenue data, which column(s) constitute revenue (gross? net? recognized?), how regions are defined in this schema (is there a regions table or a region column on transactions?), and what time period to aggregate over. A good Text-to-SQL system asks for clarification rather than making silent assumptions.

The challenge is knowing when to ask versus when to infer. Agentica uses a two-pass approach: the first pass generates the SQL based on the most plausible interpretation and includes a structured commentary explaining the assumptions made. The user sees both the query and the assumptions, can verify or correct them, and the system learns which interpretations are preferred over time. This is more useful than either silently guessing or interrupting with clarifying questions for every query.

Result Interpretation

SQL returns a result set. Natural language questions expect a natural language answer. The gap between these two things — converting a table of numbers into a clear, accurate, appropriately caveated statement — is where Text-to-SQL systems frequently produce the most subtle errors.

The most common class of error is overconfident interpretation: the model states a finding as a definitive fact when the data actually admits multiple interpretations, or when the query missed a relevant filter condition that would change the result. Agentica's response generation step includes explicit prompting to identify and state the assumptions and limitations of the result — not just "Q3 revenue was $4.2M" but "Q3 revenue from recognized transactions in the US region was $4.2M, based on the revenue_transactions table. This excludes deferred revenue and non-US operations."

Deploy Strategic Intelligence

Schedule a technical briefing on multi-agent deployment patterns.

Contact Engineering

Similar Research

View All Logs

ENTERPRISE

Enterprise AI Deployment: The Data Sovereignty Question

Enterprise buyers increasingly ask the same question: where does our data go? The answer shapes architecture decisions, vendor selection, and organizational policy. Here's an honest accounting of the options and tradeoffs.

Analyze Report →

PRODUCT

What It Takes to Build an AI Research Assistant for Enterprise Data

We built Agentica to answer a question we couldn't stop asking: why can't AI actually work with our data? Two years in, here's what we've learned about the gap between demo-grade and production-grade AI research.

Analyze Report →