QueryGPT: NL-to-SQL Pipeline

A multi-agent NL-to-SQL pipeline of four staged agents (intent, table selection, column pruning, SQL generation) that progressively narrows schema context to cut token usage, with pydantic-ai structured outputs and automatic retries, llama-index + Neo4j hybrid vector store RAG, and end-to-end SQL validation.

2025

LLMsStructured outputsRAGllama-indexpydantic-aiNeo4j

The problem

NL-to-SQL systems often fail in predictable ways:

Selecting the wrong tables
Over-selecting columns (token bloat)
Generating syntactically valid SQL that’s semantically wrong
Breaking as schemas change

What I built

A pipeline of four staged agents that decomposes the task into smaller, testable steps:

Intent classification
Table selection
Column pruning
SQL generation

Progressively narrowing the schema context at each stage cuts token usage and reduces failure cases. Each step emits structured outputs via pydantic-ai with automatic retries rather than free-form text, making the whole pipeline easier to debug, cheaper to run, and more reliable.

The stack uses llama-index and a Neo4j hybrid vector store to retrieve similar query examples (RAG), and validates generated SQL end-to-end so errors surface immediately.

Why it’s interesting

It demonstrates “LLM engineering as software engineering”: interfaces, validation, and failure containment.
It’s a realistic data product: schema-aware, maintainable, and designed for iteration.