Skip to content

QueryGPT: NL-to-SQL Pipeline

A multi-agent NL-to-SQL pipeline of four staged agents (intent, table selection, column pruning, SQL generation) that progressively narrows schema context to cut token usage, with pydantic-ai structured outputs and automatic retries, llama-index + Neo4j hybrid vector store RAG, and end-to-end SQL validation.

2025
LLMsStructured outputsRAGllama-indexpydantic-aiNeo4j

The problem

NL-to-SQL systems often fail in predictable ways:

  • Selecting the wrong tables
  • Over-selecting columns (token bloat)
  • Generating syntactically valid SQL that’s semantically wrong
  • Breaking as schemas change

What I built

A pipeline of four staged agents that decomposes the task into smaller, testable steps:

  • Intent classification
  • Table selection
  • Column pruning
  • SQL generation

Progressively narrowing the schema context at each stage cuts token usage and reduces failure cases. Each step emits structured outputs via pydantic-ai with automatic retries rather than free-form text, making the whole pipeline easier to debug, cheaper to run, and more reliable.

The stack uses llama-index and a Neo4j hybrid vector store to retrieve similar query examples (RAG), and validates generated SQL end-to-end so errors surface immediately.

Why it’s interesting

  • It demonstrates “LLM engineering as software engineering”: interfaces, validation, and failure containment.
  • It’s a realistic data product: schema-aware, maintainable, and designed for iteration.