Skip to main content
2026 Edition · From your first pipeline to a lakehouse

How data is
actually moved.

A complete path into data engineering. Start at "what even is a columnar file?" and end able to ingest, store, model, transform, orchestrate and serve data at scale — with the durable concepts foregrounded so the skills outlast any one tool. Every term is defined the first time it appears.

12
Chapters
Batch · Stream · Lakehouse
The whole pipeline
Jun ’26
Last reviewed
$0
Free · source-available
orders.sqlrun
-- Transform raw events into a trusted, modeled table.
SELECT date_trunc('day', ts) AS day,
       count(*) AS orders
FROM bronze.events
WHERE event = 'order_placed'
GROUP BY 1;

# → Materialized gold.daily_orders · 1.2M rows scanned
declarative · versionedtested like any code
Start here

Two ground-truth facts before chapter 1.

01

Data engineering is plumbing — move data from where it's made to where it's trusted.

Every tool — a warehouse, Spark, an orchestrator, a streaming bus — exists to serve one part of that journey. Master what job each tool does and the crowded vendor landscape collapses into a handful of roles you can hold in your head.

02

Learn the durable concept, not the dated vendor.

Engines get renamed and re-launched every year. But columnar storage, partitioning, idempotency, and dimensional modeling are decades-stable. This guide teaches the concept first, then shows it in today's tools.

Who it’s for

Meets you wherever you are.

“I can write code and SQL…”
…but I’ve never built a data pipeline, and words like "lakehouse" and "CDC" are a blur. Start at file formats — no prior data-platform experience assumed.
“I run production pipelines.”
…and want a sharp 2026 refresh on the lakehouse, streaming, orchestration, data quality and the decision rules that actually hold up.
Ready?

What happens between a raw event and a trusted table?

The whole guide fans out from that one question. Twenty minutes from now you’ll know exactly what columnar storage, partitioning, and the OLTP/OLAP split are — and why every later decision rests on them.