Skip to content

QueryGPT: NL-to-SQL Pipeline

A modular, multi-agent NL-to-SQL pipeline with structured intermediate outputs to reduce token usage and failure modes as schemas evolve.

2025
LLMsStructured outputsRAGPydanticData products

The problem

NL-to-SQL systems often fail in predictable ways:

  • Selecting the wrong tables
  • Over-selecting columns (token bloat)
  • Generating syntactically valid SQL that’s semantically wrong
  • Breaking as schemas change

What I built

A pipeline architecture that decomposes the task into smaller, testable steps:

  • Intent detection
  • Table selection
  • Column pruning
  • SQL generation
  • (Optional) retrieval-augmented hints via a vector store

Each step emits structured outputs rather than free-form text. This makes the whole pipeline easier to debug, cheaper to run, and more resilient.

Why it’s interesting

  • It demonstrates “LLM engineering as software engineering”: interfaces, validation, and failure containment.
  • It’s a realistic data product: schema-aware, maintainable, and designed for iteration.