Adaptive Database Optimizer

A closed-loop system that observes SQL workloads and incrementally optimises Parquet physical layouts (partitioning + sorting), validating improvements with automated DuckDB benchmarks.

2025

Reinforcement LearningAdaptiveRoutingAutomationPartitioning

The problem

Teams often store data in Parquet and query it with engines like DuckDB. The same dataset can perform very differently depending on how it’s physically laid out (partitioning, sorting, row groups). But workloads change over time, and hand-tuning layouts doesn’t scale.

What I built

An adaptive feedback loop that:

Logs query workload features (tables, predicates, group-bys, joins, projections).
Proposes candidate layouts (e.g., partition by X, sort by Y).
Rewrites datasets into those layouts.
Benchmarks performance on a representative workload and keeps winners.

Why it’s interesting

Layout optimization is a classic “systems meets data” problem: you need practical heuristics, measurable improvements, and careful evaluation.
The system is designed to be dataset-agnostic (works beyond one specific dataset).