Skip to content

Adaptive Database Optimizer

A self-tuning system that watches SQL workloads and automatically retunes how Parquet data is partitioned and sorted, routing traffic across layout versions with multi-armed and contextual bandits (UCB1, Thompson sampling) and keeping only statistically confident improvements.

2025
BanditsAdaptiveRoutingAutomationPartitioningDuckDB

The problem

Teams often store data in Parquet and query it with engines like DuckDB. The same dataset can perform very differently depending on how it’s physically laid out (partitioning, sorting, row groups). But workloads change over time, and hand-tuning layouts doesn’t scale.

What I built

A self-tuning feedback loop that:

  1. Logs each query’s access patterns (tables, predicates, group-bys, joins, projections).
  2. Proposes candidate layouts from workload analysis (e.g., partition by X, sort by Y).
  3. Rewrites datasets into those layouts.
  4. Routes traffic across layout versions with multi-armed and contextual bandits (UCB1, Thompson sampling), learning which layouts win as access patterns shift.

Built in Python with DuckDB and PyArrow, with statistically gated rewards so only confident improvements are kept.

Why it’s interesting

  • Layout optimization is a classic “systems meets data” problem: you need practical heuristics, measurable improvements, and careful evaluation.
  • The bandit-based routing turns layout selection into an online learning problem that adapts to changing workloads.
  • The system is designed to be dataset-agnostic (works beyond one specific dataset).