Skip to main content

The lake, the warehouse, and the divide that collapsed

Before you can appreciate what a table format does, you need to feel the pain it removes. That pain comes from a choice the industry was stuck with for twenty years: warehouse or lake. Each gave you something the other couldn't. This lesson explains both sides, why object storage can't bridge them by itself, and why that finally changed.

Two ways to store analytic data

A data warehouse (Chapter 4) is a managed system — Snowflake, BigQuery, Redshift — that stores your tables in its own internal format and serves SQL over them. You load data in, and in return you get transactions (changes are all-or-nothing), schema enforcement (bad rows are rejected), fast queries, and a tidy table abstraction. The price: your data lives inside the vendor's engine, in a format only that engine reads, and you pay that vendor for storage and every query.

A data lake is the opposite philosophy: dump your raw files — Parquet, JSON, CSV — into cheap object storage (Chapter 2; think Amazon S3, Google Cloud Storage, Azure Blob) and point whatever engine you like at them. Object storage is astonishingly cheap and effectively infinite, and the files are open — Spark, Trino, DuckDB, a Python script, anything can read them. No vendor owns your bytes.

So why not just use the lake for everything? Because a bare lake throws away almost every guarantee the warehouse gave you.

Why object storage can't give you ACID on its own

Here is the crux of the whole chapter. Object storage stores objects (files), not tables. It has no concept of "this group of files is one table, and a query should see a consistent version of it." A table in a lake is just a convention: "all the Parquet files under s3://shopflow-lake/fact_sales/ are ShopFlow's fact_sales table (ShopFlow — see Meet ShopFlow)." Object storage knows nothing about that convention, and so it can give you none of the following.

ACID is the four-letter promise that makes a database trustworthy:

  • Atomicity — a change happens completely or not at all; never half.
  • Consistency — the table is never left in a broken in-between state.
  • Isolation — readers and writers don't see each other's half-finished work.
  • Durability — once committed, a change survives crashes.

Object storage gives you durability and not much else. Watch what breaks:

  • No atomic commits. Say a job needs to replace yesterday's fact_sales partition with a corrected version: delete 200 old files, write 200 new ones. There is no way to make object storage do all 400 operations as one indivisible step. If the job crashes halfway, fact_sales is now a mangled mix of old and new order-line files. Half-written tables are the default failure mode of a raw lake.
  • No consistent reads (no isolation). While that job is mid-write, the daily revenue query runs over fact_sales. It sees some new files and some old ones — a table that never actually existed at any single point in time, so the day's line_revenue total is wrong. There's no notion of "read the table as it was at a consistent moment."
  • No schema enforcement. Object storage will happily accept a fact_sales file where line_revenue changed type or a required field is missing. Nothing checks. The corruption is discovered later, by a query that returns garbage or crashes.
  • Expensive, slow "list" operations. To even know which files make up fact_sales, an engine must list the bucket — ask object storage to enumerate every object under the fact_sales/ prefix. For a table of millions of files this is slow and, on cloud object stores, billed per request. And listing tells you the files exist; it can't tell you which ones belong to a consistent version of the table.

:::note Object storage is not a file system A subtle source of these problems: cloud object stores are key-value stores of immutable objects, not POSIX file systems. You can't append to an object or rename a "directory" atomically — a rename is a copy-then-delete of every object. Tricks that worked on a local disk (write to a temp dir, then mv it into place atomically) simply don't exist here. The table format has to provide atomicity itself, in metadata, because the storage layer won't. :::

So the early data lake delivered on cost and openness but, lacking ACID, became infamous as a "data swamp": a place where data went in, consistency was never guaranteed, and nobody fully trusted what came out.

Warehouse+ ACID transactions+ Schema enforcement+ Fast SQL- Proprietary format / lock-in- $$ storage + computeBare data lake+ Cheap object storage+ Open files / any engine- No atomic commits- No consistent reads- No schema enforcement

The lakehouse: have both

For years the answer was to run both and copy data between them: land ShopFlow's raw order files in the lake, then load the cleaned fact_sales into a warehouse for anything that needed reliability and speed. Two systems, two copies of the data, a pipeline gluing them, and double the cost.

The lakehouse is the architecture that collapses that into one. The idea: keep fact_sales in the lake — cheap object storage, open file formats — but add a thin metadata layer on top that supplies the missing guarantees. That metadata layer is the open table format. With it, the same Parquet files in the same bucket gain atomic commits, consistent reads, schema enforcement, and more — without moving the data into anyone's proprietary engine and without giving up the ability for many tools (the daily revenue query, a Spark backfill, a BI dashboard) to read it.

A lakehouse is, in one sentence: a data lake with a table format on top, giving warehouse-grade reliability over open files on cheap storage.

fact_sales Parquetfiles\nin objectstorage\n(cheap,Table formatmetadata\n(Iceberg /Delta / Hudi)\n→Many enginesread/write\nSpark ·Trino · Flink

We'll spend the rest of the chapter on how that metadata layer pulls this off — but the shape of the win is already clear: one copy of your data, openly stored, behaving like a warehouse table.

:::tip Durable vs dated The durable idea is the architecture: one open copy of data on cheap storage, made reliable by a metadata layer, readable by many engines. The dated parts are which table format you pick, which cloud you're on, and which query engines are fastest this year. Learn the architecture; the products are interchangeable details. :::

Why it matters

A warehouse gives you ACID, schema, and speed at the cost of lock-in and money; a bare data lake gives you cheap, open files but no transactions, no consistent reads, and no schema enforcement — because object storage stores files, not tables, and can't make multi-file changes atomic. That gap is what made early lakes into untrustworthy "swamps." The lakehouse resolves the dilemma by adding an open table format — a metadata layer over the same cheap, open files — that restores warehouse-grade guarantees without moving the data into a proprietary engine. The next lesson opens up that metadata layer: what a table format actually is, and why it is emphatically not a file format.

Next: What a table format actually is →