Skip to main content

Data layout on disk: rows vs columns

A table in your head is a grid — rows down, columns across, two dimensions. A disk (or an object in a bucket) is not a grid. It is a one-dimensional stream of bytes, read start to finish. So the very first thing any storage format must do is answer an unavoidable question: in what order do I write the cells of a two-dimensional table into a one-dimensional line of bytes?

That single choice — the physical layout — is the root of nearly everything in this chapter. Get it right for your workload and queries fly; get it wrong and the same query reads orders of magnitude more data than it needs.

The grid has to be flattened

Take a tiny slice of ShopFlow's orders table (our running example — see Meet ShopFlow): three rows, three of its columns.

order_idstatusamount
1paid50
2paid90
3cancelled20

There are two natural ways to flatten it into a byte stream.

Row-major (row-oriented) writes each row completely before moving to the next. The bytes on disk read:

1, paid, 50 | 2, paid, 90 | 3, cancelled, 20

Column-major (columnar) writes each column completely before moving to the next. The bytes on disk read:

1, 2, 3 | paid, paid, cancelled | 50, 90, 20

Same data, same number of bytes, utterly different order. The word for this byte order is the physical layout. The table you think about — the logical grid — is identical either way; only the arrangement on disk changes. And that arrangement decides which queries are cheap.

:::note Layout is invisible until it isn't SELECT * FROM orders returns the same answer regardless of layout. Layout never changes what you can ask — only how much data the machine must drag off disk to answer. That cost difference is the entire reason this distinction matters. :::

Two different jobs, two different layouts

Why two layouts? Because two very different kinds of workload want opposite things. (We met these in Chapter 1 as OLTP and OLAP.)

Transactional work (OLTP) touches whole records, a few at a time: "fetch order #4471 and update its status to shipped." It wants every field of one row sitting together so it can read or write that record in one go — exactly what ShopFlow's app database does on every checkout. Row-major is perfect for this — the whole row is contiguous.

Analytic work (OLAP) does the opposite: it touches a few columns across millions of rows: "what's the total amount across every order ShopFlow has ever taken?" It needs only status and amount — never customer_id, order_ts, or the other columns — but across every single order. Column-major is perfect for this, and we're about to see exactly why.

Row-major (OLTP)order 1 · order 2 · order 3 …\nwhole records contiguousColumn-major (OLAP)all order_ids · all statuses · all amounts\neach column contiguous

Why columnar wins for analytics

A columnar layout gives an analytic engine three compounding advantages. They are the three reasons "use a columnar format" is repeated everywhere — so let's actually earn each one.

1. Column pruning (read only the columns you need)

This is the headline. ShopFlow's source orders table is narrow, but by the time it lands in the analytics lake it's been enriched — joined to customer region, product category, channel, shipping, marketing attribution — into a wide ~40-column orders table. Your query is SELECT SUM(amount) FROM orders WHERE status = 'paid'. That query references exactly two columns: status and amount.

  • In a row-major file, the values of all 40 columns are interleaved on disk. To get at amount, the reader has to stream past the other 39 columns of every order — it physically cannot read one column without reading the bytes of the others around it. You read ~100% of the file to use ~5% of it.
  • In a columnar file, status and amount each live in their own contiguous run of bytes. The reader seeks straight to those two runs and skips the other 38 columns entirely — they are never read off disk.

Skipping the columns a query doesn't reference is called column pruning (or projection pushdown — the engine "pushes" the list of needed columns down to the reader so unneeded ones are never loaded). On a wide table this alone is often a 10–40× reduction in bytes read. It is the single biggest reason analytics is columnar.

2. Better compression

Compression shrinks data by exploiting repetition and similarity. Here's the key insight: a single column is far more self-similar than a single row. ShopFlow's status column is a long run of paid, paid, paid, shipped, paid… — a handful of distinct values repeating. The order_ts column is a run of timestamps that barely change from one order to the next. A row, by contrast, mixes a bigint order_id, a string status, and a decimal amount — different types, little similarity.

Because a columnar layout puts like values next to like values, the compressor has far more to work with. The same data often compresses 2–4× smaller columnar than row-major. (The encoding tricks that make this so good — dictionary, run-length, delta — get their own lesson next chapter-section: see Compression & encoding.) Smaller files mean less disk, less network transfer, and — since I/O is usually the bottleneck — faster queries.

3. Vectorized reads

Modern CPUs go fastest when they apply one operation to many values of the same type in a tight loop — a style called vectorized execution (the CPU's SIMD units literally process several values per instruction). A columnar layout hands the CPU exactly that: a contiguous array of, say, ten thousand order amount values, all the same type, ready to be summed in one tight loop. A row layout would force the CPU to hop over other fields between every value, defeating vectorization. So columnar isn't just less I/O — it's also faster compute once the data is in memory.

:::tip The one-sentence rule of thumb Few columns, many rows → columnar. Whole rows, few at a time → row-based. Analytics is the former, so the analytics world (Parquet, ORC, every columnar warehouse) is columnar. Transactional databases and streaming transport are the latter, so they stay row-based. :::

A traced example

Let's make the win concrete. ShopFlow's enriched orders table: 50 columns, 100 million orders, ~200 GB on disk. Query:

SELECT status, SUM(amount)
FROM orders
WHERE order_year = 2026
GROUP BY status;

It references three columns: status, amount, order_year.

  • Row-major: to read those three columns the engine must stream essentially the whole ~200 GB off disk, because the columns are interleaved order by order. Bytes read ≈ 200 GB.
  • Columnar: the engine reads only the status, amount, and order_year column runs — 3 of 50 columns, so roughly 6% of the data, and that 6% is more compressible because each column is self-similar. Bytes read ≈ 8–12 GB.

Same answer; the columnar engine moved roughly 20× less data. At cloud scale, where you often pay per byte scanned, that 20× is also a 20× bill reduction. This is why the choice of layout is a load-bearing decision, not a detail.

When row-based is still right

Columnar is not universally better — it's better for analytics. Reach for a row-based layout when:

  • You read or write whole records, one at a time (an application's transactional database — that's OLTP, and it stays row-based).
  • Data is in flight, being produced and consumed record by record — e.g. events streaming through Kafka, where each message is one whole record. The standard row format here is Avro, covered in Choosing a format.
  • You're appending constantly and rarely scanning. Columnar formats are written in batches (you'll see why in Inside Parquet); they're poor at one-row-at-a-time appends.

The mature pattern most platforms use: data arrives row-based (Avro over a stream), then gets rewritten into columnar (Parquet on the lake) for analytics. Right tool, right stage.

Why it matters

A disk stores bytes in one dimension; a table is two-dimensional; so a storage format must choose an order. Row-major keeps each record contiguous and serves transactional, whole-row access. Columnar keeps each column contiguous and wins decisively for analytics through three compounding effects: column pruning (read only the columns a query references), better compression (like values sit together, so they shrink more), and vectorized reads (the CPU sums a contiguous typed array in a tight loop). For wide tables this is routinely a 10–40× reduction in data read — and at pay-per-scan cloud pricing, the same reduction in cost. Row-based still wins for OLTP and for data in flight. With why columnar wins established, the next lesson opens up the most important columnar format — Parquet — to show the machinery that turns "read fewer columns" into "also read fewer rows."

Next: Inside Parquet: row groups, pages & footer stats →