ETL vs ELT, and the modern data stack
Before we touch a single dbt model, you need the idea that makes dbt make sense. It's a tiny change — the order of three letters — but it reorganized the entire data industry. This lesson explains the shift from ETL to ELT, why it happened, and the stack of tools ("the modern data stack") that grew up around it.
The three jobs: extract, load, transform
Every analytics pipeline does three things, and the names are exactly what they sound like. We'll name them on ShopFlow (ShopFlow — see Meet ShopFlow), the running example whose data this whole chapter transforms:
- Extract (E) — pull raw data out of a source system (here, ShopFlow's app database tables
orders,customers,products,order_items; in general a production database, a SaaS API like Stripe or Salesforce, a pile of files). - Load (L) — write that data into your analytics store (a warehouse like Snowflake, BigQuery, or Databricks) — ShopFlow's land as
raw.orders,raw.customers,raw.products,raw.order_items. - Transform (T) — reshape the raw data into clean, modeled, business-ready tables (join, clean, deduplicate, aggregate, rename, type-cast) — for ShopFlow that means
stg_ordersand friends, thenfact_salesand its dimensions.
The only question is what order you do them in. That order is the whole story.
ETL: the old order (transform before loading)
ETL stands for Extract → Transform → Load. For decades this was the only sensible order, and the reason was money. Storage and compute in the data warehouse were expensive — you paid a lot per gigabyte and per CPU-hour, and the warehouse was a fixed-size box you couldn't easily grow. So you did not dare load raw, messy data into it. Instead:
- Extract the raw data to a separate, cheaper place.
- Transform it there — on a dedicated transformation server running a specialized ETL tool (Informatica, Talend, SSIS) — cleaning and shaping it down to just the modeled tables you needed.
- Load only the finished, clean result into the precious warehouse.
The warehouse only ever held tidy, final tables. That sounds efficient, but it had deep problems, which the next section explains by contrast.
ELT: the new order (load first, transform in place)
ELT stands for Extract → Load → Transform. You load the raw data into the warehouse first, untouched, and then transform it using the warehouse's own compute — by running SQL inside the warehouse to build new tables from the raw ones.
Why did the order flip? Because the constraint that justified ETL disappeared. The cloud data warehouse (Chapter 4) changed three things at once:
- Storage got cheap. Loading all your raw data costs almost nothing now, so there's no reason to throw any of it away before loading.
- Compute got elastic and powerful. The warehouse can run enormous transformations itself, scaling up for a big job and back down after — you no longer need a separate transformation server.
- Compute and storage decoupled. You pay for transformation only when it runs, not for a fixed box sitting idle.
Once the warehouse is cheap to fill and powerful enough to transform, loading raw and transforming in place is simply better. ELT won.
Why ELT is better (not just different)
The reordering isn't cosmetic; it unlocks real advantages:
- Raw data is preserved. Because you load everything untouched, the raw data is always there. If you discover a bug in your transformation logic — or a new question you didn't anticipate (say ShopFlow now wants revenue split by
dim_storechannel you never modeled) — you can rebuild fromraw.orders/raw.order_items. In ETL, the raw data was transformed away before it ever landed; if you needed it later, it was gone. - Transformation becomes "just SQL." Instead of a proprietary ETL tool with a drag-and-drop GUI, transformation is now SQL
SELECTstatements that any analyst can read and write. This is the door dbt walks through. - One engine, one place. Storage, transformation, and querying all happen in the warehouse. Fewer moving parts, one security boundary, one place to look.
- Elastic cost. The transformation runs on warehouse compute that scales with the work and bills by usage.
The trade-off to name honestly: because you load raw data first, you do store more (cheap, but not free), and a runaway transformation can run up a real compute bill. ELT moves the cost from "a fixed transformation server" to "metered warehouse compute" — usually a win, but one you now have to watch.
:::tip Durable vs dated Durable: the principle that cheap elastic warehouse compute makes "load raw, transform in place" the right default. Dated: which warehouse, the exact per-credit price, and the specific EL connector you use. The price of compute will keep falling; the logic of ELT will not change. :::
The modern data stack
ELT didn't arrive alone. It came as part of a now-standard assembly of tools called the modern data stack — a loosely-coupled set of best-in-class, mostly-managed services, each doing one job, wired together around the warehouse:
- Managed EL (Extract + Load) — a tool like Fivetran or Airbyte that connects to a source and replicates raw data into the warehouse, handling the tedious connector plumbing for you. (This is the ingestion work from Chapter 6.)
- The cloud warehouse — Snowflake, BigQuery, or Databricks — the central store where raw data lands and transformation runs.
- In-warehouse transformation — dbt, which orchestrates the T in ELT as version-controlled, tested SQL. This chapter is mostly about this box.
- BI and activation — dashboards (Looker, Tableau, Power BI) and "reverse ETL" tools that push modeled data back out to operational systems.
The thesis of the modern data stack is managed EL + cloud warehouse + dbt for in-warehouse ELT + BI — four interchangeable layers, each replaceable without rewriting the others. The roles are durable; the vendor in each box is dated.
Where this leaves us
ELT flipped the order — load raw into the warehouse first, then transform in place with SQL — because cheap, elastic, decoupled cloud-warehouse compute removed the constraint that made ETL necessary. That single change preserves raw data, turns transformation into readable SQL, and consolidates everything into one engine. The modern data stack wraps that idea in four swappable layers around the warehouse. The T in ELT — the in-warehouse transformation step — is now almost universally done with one tool, and understanding it starts with one deceptively simple idea: a transformation is just a SELECT.
Next: dbt's mental model →