Chapter 12 checkpoint
You can now turn the whole guide into judgment — design a platform, make and document the decisions, control the cost, operate it with rigor, build for machine consumers, and navigate the career. Recall the chapter, then prove it.
The throughline
- Design the spine for your scale. Every platform is the same five-stage spine — ingest → store → process → serve, wrapped in orchestration/quality/governance — implemented as the simplest thing that meets the requirement (tiny at solo, the modern stack at startup, governed and distributed at enterprise). The senior interview and the job: clarify requirements → estimate capacity (events/sec at peak, bytes/day) → sketch the spine → handle the cross-cutting concerns — above all idempotency (re-runnable steps so retries/backfills never double-count) and failure handling.
- Decide, and write it down. Batch vs streaming, warehouse vs lakehouse, build vs buy, managed vs OSS each default to the simpler/cheaper option; deviate only on a stated trigger (a real latency/scale/cost need). Prefer boring, proven tech; record the call in an ADR (context, decision, trade-off, trigger-to-revisit).
- Cost is performance. Reduce bytes-touched and idle-time and you cut cost and latency together. Track cost-per-query / cost-per-pipeline; partition (prune), cluster (skip within partitions), cache/materialize, process incrementally, and right-size + auto-suspend + spot. Run a see → optimize → govern loop.
- DataOps separates senior from junior. Everything in Git, CI/CD that tests pipeline changes before prod, dev/staging/prod environments, IaC (Terraform) and containers. Treat data as a product with SLAs/SLOs, monitor freshness/quality, run incident response + blameless postmortems, and do security basics — secrets out of code, least-privilege IAM, encryption, closed networks.
- The 2026 shift + the career. Data engineering now feeds machine consumers: RAG = ETL + embeddings + a fresh vector store; feature pipelines prevent train/serve skew. The role split into analytics / data / platform engineer. Certs signal, the end-to-end portfolio (with an ADR) proves; invest in the durable (SQL, modeling, partitioning, idempotency, decisions, communication) over the dated (this year's tool).
Quiz
Chapter 12 — Scale, Decisions & Career
Pass to unlock the Next button belowThat completes the guide. You can now move data from raw source to trusted product across batch and streaming, store it in warehouses and lakehouses, model and transform it, orchestrate and quality-gate it — and, in this chapter, design the whole platform, make and write down the recurring decisions, control its cost, operate it with DataOps rigor, build it for machine consumers, and navigate the career that does all of it. The throughline from page one holds: invest in the durable ideas, hold the dated tools loosely, and match the architecture to the actual scale, latency, team, and budget.
Next: Glossary →