Why I Did Not Let an LLM Touch My Database
Why I avoided text-to-SQL, leaned into Convex, and designed a safer text → schema → code pipeline for grid data exploration.
Why I built this
I built this app to make grid data easier to explore through simple natural language questions. As a grid engineer, I spend a lot of time jumping between spreadsheets, portals, and scripts just to answer basic questions about projects, voltages, and capacity. I wanted something that lets me ask those questions directly, in plain English, and get reliable answers back.
This blog is a snapshot of how I am currently thinking about the system behind that idea. It explains the architecture choices I made, why I made them, and the trade-offs I ran into along the way. I am still learning and experimenting, so some of this will almost certainly change over time. This post is written for engineers who care about data integrity, not just for end users.
[ Architecture Overview ]
text → schema → code
The LLM never touches the database directly
The obvious approach: just let the LLM write SQL
The core idea behind the app is simple. I want users to be able to ask questions about grid data in plain English and get structured answers back. The most common way to build something like this today is to store your data in a SQL database, send the user’s question to a large language model, and ask it to generate a SQL query that runs against that database.
On paper, this sounds perfect as SQL already describes how to filter, join, and aggregate data, and modern language models are decent at writing it. If it works, you get a very flexible system where users can ask almost anything and the model figures out how to query the data.
In practice, this approach runs into several serious problems.
Why text-to-SQL breaks down in the real world
Text-to-SQL produces an untyped string; unless you add extra tooling, schema mistakes and permission issues tend to surface at runtime rather than being enforced by a typed API boundary.
Second, the quality of the generated SQL depends heavily on the model. Simple queries may work with small or open-source models, but as soon as the questions become more complex, accuracy drops quickly. To get consistently good results, you often need large, expensive models, which makes the system costly to run.
Third, security is hard to enforce. You can try to add guardrails in prompts, but they vary from model to model and can often be bypassed. A single bad/malicious query can expose data it should not or perform actions you never intended.
Finally, language models hallucinate, especially when they do not have a perfectly accurate picture of the database schema. Prompts can help, but true reliability comes from deterministic code and strict validation outside the model, not from hoping the model behaves.
Looking for a safer foundation
After my research, my main question was whether there a way to limit what the language model can generate and still get useful, flexible answers without overcomplicating the architecture?
For example, if a user asks, “Get me all the BESS projects connected to the 33kV network,” I do not want the model guessing table names, column names, or technology labels in a SQL string. I want it to deterministically pick the correct table, apply the right voltage filter, and map “BESS” to a known, predefined technology type that matches how the data is actually stored. That way, even if a user says “battery” instead of “BESS,” the system still knows they mean the same thing and produces a correct, query.
What I was really looking for was a way to force the model to operate inside a strict schema, so that it could express intent but not invent structure.
These questions are what led me to Convex. I started digging into its design, its primitives, and how it approaches data access. Convex acts as a separate backend where you define your database schema, queries, and mutations directly in TypeScript. Instead of raw SQL, you write strongly typed functions that control exactly how data can be read or changed. Additionally, Convex guarantees real-time synchronization between the database, backend, and frontend, meaning that changes to database tables are automatically reflected on the frontend without the need for manually configuring WebSocket connection. It also provides excellent third party integrations for implementing authentication and AI features in apps.
In my head, this immediately felt like a better foundation for what I was trying to build. With schemas, queries, and mutations all living in TypeScript (with runtime validation), I could imagine an architecture where the language model is constrained to a known set of types and fields, rather than being free to generate arbitrary database logic. That made it feel possible to build a safer text-to-data system without overcomplicating the backend.
From text to schema to code
Even with Convex, I was not comfortable letting a language model generate database queries directly. That would bring back many of the same problems I had with text-to-SQL, just in a different form. I still needed a strong safety boundary between what the model produces and what actually runs against the database. For instance, if a malicious user requests a query to retrieve sensitive data or make unauthorized changes to the database, the system should prevent execution. This safety layer should be built using deterministic code and strict system prompt guardrails.
The pattern I ended up with is what I call text → schema → code.
Instead of asking the model to write queries, I ask it to convert the user’s question into a small, structured object that follows a strict schema. This object describes what the user is asking for in a controlled way, using fields and values that I define ahead of time. The model never sees table names, query functions, or database internals. This keeps the architecture simple while still giving users a natural language way to explore grid data.
Once that schema object is created, trusted backend code takes over. A fixed Convex function reads the object and turns it into a database query using normal, type-safe query primitives and auth checks. The language model is only responsible for expressing intent. My code is responsible for deciding how that intent is executed.
For example, if a user asks, “Show me BESS projects at 33kV near Leeds,” the model produces something like { voltageKv: 33, location: "Leeds", technology: "BESS" }. That object is validated against a TypeScript schema, and then a Convex query function uses it to fetch the correct rows. The model never touches the database, and it cannot invent new fields or bypass access rules.
[ Schema Boundary ]
the model suggests · the system decides
The model can only produce a typed object. No database access. No query execution.
"Show me BESS projects at 33kV near Leeds"
{
voltageKv: 33,
technology: "BESS",
location: "Leeds"
}Deterministic code. Permission checks. Typed queries. Full control.
query({
args: schema,
handler: async (ctx) => {
// Permission check
// Index lookup
// Filter & return
}
})The LLM is trapped inside a strict, typed cage
Strong safety, limited flexibility
The language model can only produce values that fit inside a schema I control. All of the real logic lives in deterministic TypeScript code, with the same type checks and permission rules as the rest of the backend.
The downside is that flexibility moves out of the model and into the code. Every field in the schema and every way it can be combined has to be handled explicitly by the query-building logic. When the data model is simple, this works well. As the number of columns, filters, and nested objects grows, the translation layer becomes harder to maintain.
There are also entire classes of questions that this approach struggles with. For example, something like “Get me all co-located solar and storage projects where the solar capacity is between 20 and 40 MW and already connected, but the storage capacity is between 50 and 100 MW and only accepted to connect” requires different filters on different technologies inside the same project. Expressing that cleanly in a fixed schema quickly becomes awkward.
If I let the language model generate Convex queries directly, it might be able to express that logic. But then I would be back to trusting the model’s to do the right thing instead of enforcing rules in code. That is a trade-off I am not ready to make yet.
Letting users do the last mile
To support a broader set of queries, the translation layer needs to be updated, and in some cases the TypeScript schema must be expanded as well. Schema changes tend to be especially painful, since they require database migrations and careful handling of backwards compatibility. As a result, every new kind of question risks turning into a code and data change, rather than simply an improvement to the prompt.
Instead of giving more control to the LLM, I decided to give more control to the user. Once the data is fetched and shown in a table, the user can refine it directly using filters and tools in the UI. The text → schema → code pipeline is used to get you close to what you asked for, in a safe and predictable way. Once the data is on the screen, you can then refine it using filters, sorting, and other tools in the interface.
The system also shows a clear explanation of how your question was interpreted. That way you can see which voltage level, technology type, and location the model picked before the query ran. If something is off, you can adjust it directly instead of having to fight with a prompt. This creates a middle ground. The language model is kept inside a strict, typed boundary, while the user still gets a flexible way to explore and slice the data.
The hidden cost of scaling this model
This approach works well today because the app is still relatively simple. Right now everything runs against a single table, the ECR dataset, which makes the text → schema → code layer manageable. There is only one schema to maintain and one query function to keep in sync.
As soon as more tables are added, that changes. Each new dataset needs its own schema, its own translation logic, and its own permission rules. Over time, this creates more and more moving parts that all have to stay aligned with the underlying database.
The rigidity becomes even more obvious when you start thinking about cross-table questions. Something like “Show me solar projects connected to primary substations with more than 50 MW of headroom” would require combining information from multiple datasets. With the current approach, I would need to explicitly model those relationships and all of the allowed query patterns ahead of time.
The more relational and exploratory the questions become, the more this architecture starts to feel constrained. What began as a safety boundary slowly turns into a bottleneck.
A more powerful, but riskier, idea
Lately I have been thinking about a very different approach, inspired by how Convex’s own query system and Model Context Protocol servers work under the hood.
Instead of converting user questions into a fixed schema and then into hand-written code, the idea is to let the system generate small pieces of query code on the fly. That code would only ever run in a tightly sandboxed, read-only environment with access to a very limited subset of the database.
The pattern would look like this: user text → generated query code → run that code in a safe runtime that can only read, never write, and can only see the tables I explicitly expose. Conceptually, it is like giving the model a calculator over part of the data, not the keys to the whole warehouse.
Any attempt to do more than read would be blocked by the environment itself, not just by prompts or best-effort guardrails. The sandbox becomes the real security boundary.
Why even read-only code needs real security
Even in a read-only sandbox, security still matters. It is not enough to say that the model cannot write to the database. You also have to control what it is allowed to see.
Every query still needs to respect the same access rules as the rest of the app. That includes which rows a user is allowed to view, which columns are considered sensitive, and which combinations of data should never be returned together. A poorly designed read-only environment can still leak information if it is not carefully constrained.
That means any sandboxed query layer would have to sit behind a clear permission boundary. The generated code might be dynamic, but the data it can touch would be governed by deterministic, audited rules.
As exciting as this idea is, I am not ready to put it into production. Designing a sandboxed query runtime with the right permission model is a hard problem, and I am still not convinced it is even feasible with Convex in its current form.
For now, this approach lives firmly in the “future experiments” category. The existing text → schema → code pipeline is much easier to reason about, easier to secure, and easier to debug. It may be more rigid, but it behaves in predictable ways, which matters a lot when you are dealing with real infrastructure data.
Why I still chose Convex over Postgres and an ORM
In theory, this same text → schema → code architecture could be built on top of Postgres with an ORM and a custom backend. But in that world, I would have to wire everything together myself. I would need to build the API layer, manage real-time updates, and integrate the AI orchestration, then keep all of it in sync as the system evolves.
Convex gives me most of that out of the box in a single, type-safe stack. I get a database, a backend runtime, real-time sync, and AI agent orchestration primitives that all work together. That lets me focus more on the query model and the user experience, instead of spending months on infrastructure glue.
The limits I hit with Convex
Convex is not perfect for this use case. In my current setup, I am working within a few real constraints. There is no built-in fuzzy search. There are limits on how much data a single query can return. And only one search index can be used per query.
To work around the last point, I ended up combining text from multiple columns into a single searchable field and indexing that instead. It is not ideal, but it gets me most of the way there without breaking the rest of the architecture.
I plan to write a deeper dive on the pros and cons of Convex itself at some point. For now, these trade-offs feel acceptable. Every limitation I have hit so far has had a reasonable workaround, and none of them have blocked the core idea of the app. Only time will tell whether I picked the right foundation for this app.
Where this leaves the project
This blog is a snapshot of how I am currently thinking about text-to-data for grid data. Some of these ideas will probably change as the app grows and as the tools around it evolve. That is fine. The goal is not to find a perfect architecture on day one, but to keep moving toward something that is safer, more flexible, and more useful over time.
For now, the text → schema → code approach gives me a balance I am happy with. It keeps the language model inside a strict, typed boundary, while still letting users explore complex grid data in a natural way.
If you spot mistakes, see better patterns, or have strong opinions about Convex versus a more traditional Postgres stack, I would genuinely love to hear them. You can reach out to me on LinkedIn or by email if you want to chat, suggest improvements, or just nerd out about grid data and AI tooling.
This project is very much a work in progress, and this post reflects my current thinking, not a final answer. If I look back in a year and disagree with half of it, that will probably mean I have learned something, which is exactly the point.