Skip to main content

Inside a query engine: parse, plan, optimize, execute

SQL is declarative: you describe the result you want, never the steps to get it. SELECT name FROM users WHERE country = 'DE' says nothing about whether to scan the whole table or use an index, in what order to read files, or how to join. Some piece of software has to decide all of that — and that software is the query engine. This lesson opens the hood. Once you can picture the four stages a query passes through, slow queries stop being mysterious and start being readable.

The core idea: a query is compiled, like a program

A query engine is a compiler and runtime for SQL. It takes your declarative text and translates it, through several stages, into a concrete sequence of operations that read bytes and produce rows. The same SQL string can be executed in wildly different ways; the engine's job is to pick a fast way.

SQL textParser→ syntax treeLogical plan<i>what</i> tocomputeOptimizerrewrite + choosePhysical plan<i>how</i> tocomputeExecutionread bytes → rows

Stage 1 — Parse: text → syntax tree

The parser checks that your SQL is grammatically valid and converts the text into a tree structure (an abstract syntax tree). A typo like SELCT or a missing parenthesis fails here. The parser also resolves names — does table orders exist? does column amount exist on it? — against the catalog (the engine's metadata about tables, columns, and types). Output: a validated tree that mirrors your query's structure.

Stage 2 — Logical plan: what must be computed

The engine converts the tree into a logical plan: a tree of relational operatorsScan, Filter, Project, Join, Aggregate, Sort — that expresses what the query computes, independent of how. Take a representative ShopFlow query (ShopFlow — see Meet ShopFlow) — revenue per product, which joins orders to its line items and groups:

SELECT oi.product_id, SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status <> 'cancelled'
GROUP BY oi.product_id;

Its logical plan, as a standard algebra:

Aggregate(sum(quantity*unit_price) by product_id)
└─ Join(orders.order_id = order_items.order_id)
├─ Filter(status <> 'cancelled')
│ └─ Scan(orders)
└─ Scan(order_items)

A logical plan says "scan orders, filter, join to order_items, then aggregate." It does not say which join algorithm, which file order, or how many machines — those are physical decisions. Crucially, many different physical plans can implement the same logical plan, and they can differ in speed by orders of magnitude.

Stage 3 — Optimize: rewrite and choose

The optimizer is the brain. It transforms the logical plan into a better but equivalent one, then picks the physical operators. Optimization comes in two flavors, and modern engines use both:

  • Rule-based optimization (RBO) applies fixed, always-safe rewrite rules regardless of the data — e.g. "push filters down toward the scan," "drop unused columns," "fold constants." Rules never make a plan worse, so they always fire.
  • Cost-based optimization (CBO) uses statistics about the data — row counts, distinct values, min/max, null fractions, data size — to estimate the cost of competing plans and pick the cheapest. CBO is what chooses join order and join algorithm (next lesson), because those decisions depend entirely on how big each input is.

Statistics are the fuel for CBO: summaries the engine keeps (or samples) about each table. If statistics are missing or stale, the optimizer flies blind and routinely picks bad plans — which is why "refresh table statistics / run ANALYZE" is a real and common fix for a suddenly-slow query.

Logical planRule-based rewrites(always-safe:pushdown, pruning)Cost-based choices(needs statistics:join order +algorithm)("Statisticsrow counts, NDV,min/max, size")Physical plan

The optimizations that matter most

A handful of rewrites do most of the heavy lifting, and they're the ones you'll see named in EXPLAIN plans and interviews:

  • Predicate pushdown — move filters (WHERE) as close to the data source as possible, so rows are discarded before they're read up the plan or even before they leave storage. Pushing amount > 100 into the Parquet scan of orders means the engine can skip whole row-groups whose max is ≤ 100 using the file's footer statistics (Chapter 2). Less data read = faster, cheaper query.
  • Projection pruning (column pruning) — read only the columns the query references. On columnar storage this is enormous: SELECT customer_id, amount from the wide orders table reads 2 columns' worth of bytes, not all of them. This is the entire payoff of the columnar formats from Chapter 2.
  • Partition pruning — skip whole partitions (folders/files) that can't match the filter. ShopFlow's orders are partitioned by order date, so WHERE order_date = '2026-01-15' reads one day's files and ignores the rest. Pruning a partition is the cheapest possible "read" — you never touch the bytes.
  • Dynamic filtering (runtime filtering) — when joining a huge fact table to a small filtered dimension, the engine first evaluates the small side, then builds a filter from its actual keys and pushes it into the big scan at runtime. So joining orders to a filtered customers (… JOIN customers c ON c.customer_id = o.customer_id WHERE c.region = 'US-West') can skip most of the orders scan even though the region filter wasn't written against orders directly. This is one of the biggest wins in modern engines (Trino, Spark, Snowflake).

Notice the theme: the fastest data to process is the data you never read. Pushdown and pruning are all variations of "read less."

Stage 4 — Physical plan & execution

The physical plan is the logical plan with every choice made concrete: each Join is assigned an algorithm (hash, sort-merge, broadcast — next lesson), each Scan knows which files and columns to read, and in a distributed engine the plan is split into stages connected by shuffles (Chapter 3.5). The execution engine then runs it, typically streaming rows through operators in a pipeline so it never has to materialize the whole intermediate result in memory.

Reading an EXPLAIN plan

EXPLAIN <query> prints the plan the optimizer chose without running it. EXPLAIN ANALYZE runs the query and annotates each operator with actual rows and time. This is the single most important diagnostic skill in the chapter — it's how you find which step is slow.

Read a plan tree bottom-up: leaves (scans) read data, and rows flow upward through filters, joins, and aggregates to the root. A simplified plan:

EXPLAIN
SELECT customer_id, SUM(amount)
FROM orders
WHERE amount > 100
GROUP BY customer_id;

Aggregate [group by: customer_id, sum(amount)]
└─ Exchange (repartition by customer_id) ← a shuffle: data moves between workers
└─ Filter [amount > 100]
└─ TableScan orders
columns: [customer_id, amount] ← projection pruning (only 2 columns)
pushed filter: amount > 100 ← predicate pushdown into the scan

Three things to read off it immediately: the scan reads only two columns (projection pruning worked), the filter was pushed into the scan (predicate pushdown worked), and there's an Exchange — a shuffle that moves data across workers, the most expensive thing in a distributed plan.

With EXPLAIN ANALYZE, each line gains real numbers — estimated vs actual rows, time, and bytes:

Aggregate actual rows=2 time=4ms
└─ Exchange actual rows=812k time=3.1s ← 3 seconds shuffling 812k rows!
└─ Filter est rows=10k actual rows=812k ← estimate was 80× too low
└─ TableScan actual rows=2.0M

That gap between estimated and actual rows is the master clue. Here the optimizer expected 10k rows after the filter but got 812k — its statistics were stale, so it under-shuffled the cost and picked a plan that spends 3 seconds in the exchange. The fix flows directly from the plan: refresh statistics so the estimate is right, or reduce the rows reaching the shuffle. You cannot diagnose a slow query you can't see — and EXPLAIN ANALYZE is how you see it.

:::tip How to read any plan, on any engine

  1. Read bottom-up: scans first, root last. 2. At each scan, confirm projection pruning (few columns) and partition/predicate pushdown (a WHERE reached the source). 3. Find the joins and note their algorithm and which side is built/broadcast. 4. Find every Exchange/shuffle — those are your expensive steps. 5. With ANALYZE, hunt the biggest estimated-vs-actual row gap — that's where the optimizer was wrong. The keywords differ per engine; the shape never does. :::

Why it matters

Every query you run is compiled: parsed, planned, optimized, and executed. The optimizer makes or breaks performance, and it makes good decisions only when it can (a) apply pushdown and pruning to read less, and (b) trust statistics to choose join order and algorithm. When a query is slow, the cause is almost always visible in its plan — a missing pushdown, a full scan that should have pruned, a giant shuffle, or an estimate that's wildly off. Reading EXPLAIN is the skill that turns "the query is slow" into "the filter isn't pushing down because the column is wrapped in a function — here's the fix." That diagnostic ability is what the rest of the chapter sharpens.

Common pitfalls

  • Stale or missing statistics. The optimizer's worst plans come from bad row-count estimates. ANALYZE/refresh stats after big loads.
  • Defeating pushdown by wrapping the column. WHERE CAST(order_ts AS DATE) = '2026-01-15' or WHERE UPPER(status) = 'PAID' often can't push down or prune, because the filter is on a computed value, not the stored column. Filter on the raw column (e.g. the order_date partition column) where possible.
  • SELECT * in a pipeline. It defeats projection pruning — you read every column off columnar storage. Select only what you need.
  • Reading the plan top-down. Data flows bottom-up; read it that way or you'll misattribute the cost.
  • Trusting EXPLAIN estimates as truth. They're predictions. Use EXPLAIN ANALYZE to get real rows/time before concluding anything.
  • Builds on: Storage & file formats — footer statistics and partition layout are what pushdown and pruning exploit.
  • Next: Join algorithms → — the most consequential physical choice the optimizer makes, and the one CBO spends the most effort on.
  • Then: Distributed execution — what the Exchange/shuffle in the plan actually does across machines.

Checkpoint

  1. Put these in order and say what each produces: optimizer, parser, execution, logical plan, physical plan.
  2. A query filters WHERE order_date = '2026-01-15' on the date-partitioned orders table but still reads every file. Which optimization failed, and name one reason it might not fire.
  3. In EXPLAIN ANALYZE, an operator shows est rows=1k, actual rows=900k. What does that gap most likely indicate, and what's the standard fix?
Answers
  1. Parser (text → validated syntax tree) → logical plan (what to compute, as relational operators) → optimizer (rewrite + choose, using rules and statistics) → physical plan (concrete operators/algorithms) → execution (read bytes, produce rows).
  2. Partition pruning failed. Common causes: the filter is wrapped in a function (CAST(order_ts AS DATE) = …) so it's on a computed value not the partition column; the column isn't actually the partition key; or statistics/metadata are stale.
  3. The optimizer's statistics are stale or missing, so its cost estimate is wrong and it likely chose a bad plan (e.g. wrong join algorithm or under-sized shuffle). Fix: refresh statistics (ANALYZE), and/or reduce rows reaching that step.