Skip to main content

Meet ShopFlow — our running example

Most guides teach each topic with a fresh, throwaway example — taxis here, sensors there, weather data on the next page. This one doesn't. From here to the final chapter we follow one company and one dataset as it travels the entire data lifecycle: born in an app, landed in storage, queried, modeled, reshaped in batch and in real time, governed, and paid for.

That company is ShopFlow, a fictional online store. This page defines its data once. Every later chapter reaches for these exact table and column names, so when Chapter 7 builds a stg_orders model on top of the raw.orders table that Chapter 6 ingested from the orders source table you meet below, you're watching the same row move — not learning a new example.

In plain English

Think of this page as the cast list for a story told across twelve chapters. ShopFlow sells things online; its customers place orders; each order contains line items for products. That's the whole world. Keep this page open in a tab — whenever a later chapter shows orders or fact_sales, it means these tables.

The business in one paragraph

ShopFlow runs a typical online store. Customers sign up, browse products, and place orders; each order is paid, shipped, and occasionally cancelled. The app that runs the store records every one of these facts in its own operational database — and the business wants to learn from all of it: revenue by day, top products, repeat-customer rate, carts abandoned. Turning that scattered operational data into trustworthy answers is the data engineering you'll build, chapter by chapter, on ShopFlow's data.

1. The source systems — where ShopFlow's data is born

ShopFlow's web app writes to an OLTP database (an operational, row-oriented database tuned for the app's own reads and writes — see OLTP vs OLAP). These are the source tables. Data engineers read from them; they never own them.

customers — one row per registered customer.

columntypenotes
customer_idbigintprimary key
nametext
emailtextPII
citytextcan change over time (drives the SCD example in Ch. 4)
regiontexte.g. US-West
signup_datedate
updated_attimestamplast change — drives incremental extraction (Ch. 6)

products — one row per sellable product.

columntypenotes
product_idbigintprimary key
nametext
categorytexte.g. Books, Electronics
unit_pricedecimal(10,2)current list price

orders — one row per placed order (the order header).

columntypenotes
order_idbigintprimary key
customer_idbigintcustomers.customer_id
order_tstimestampwhen the order was placed
statustextplaced · paid · shipped · cancelled
amountdecimal(12,2)order total

order_items — one row per product line within an order (the order detail). This is the finest grain in the system.

columntypenotes
order_idbigintorders.order_id
product_idbigintproducts.product_id
quantityint
unit_pricedecimal(10,2)price captured at sale time (may differ from current products.unit_price)

Primary key is the pair (order_id, product_id).

The event stream

Separately, ShopFlow's app emits a real-time event stream (Chapter 9) — every meaningful action as it happens, onto a topic called order_events:

fieldtypenotes
event_iduuidunique per event
order_idbigintorders.order_id
event_typetextplaced · paid · shipped · cancelled
tstimestampevent time (the basis for watermarks in Ch. 9)

2. The destination — the ShopFlow warehouse (star schema)

After ingestion and modeling, ShopFlow's data lands in an analytics warehouse as a star schema (taught in Chapter 4): one central fact table of measurable events, surrounded by dimension tables of descriptive context.

  • fact_sales — grain: one order line item. Measures: quantity, unit_price, line_revenue (= quantity × unit_price). Foreign keys: customer_key, product_key, date_key, order_id (degenerate dimension).
  • dim_customer — one row per customer version. Tracked as SCD Type 2 (Ch. 4) so a customer moving from Seattle to Austin keeps history. Carries lifetime_value downstream.
  • dim_product — product name, category, current price.
  • dim_date — one row per calendar day (the canonical date dimension).
  • dim_store — channel/store the order came through (web, mobile, marketplace).

3. The journey — where each chapter acts on ShopFlow's data

Here is the whole guide, as one pipeline over ShopFlow's data. Each chapter does one thing to the same dataset:

Source OLTPcustomers · products· orders ·order_itemsraw.orders,raw.customers, …(landed, untouched)Parquet filespartitioned by orderdatestg_orders ·stg_customers ·stg_order_itemsfact_sales +dim_customer/product/date/storeSELECT … joins,query plansdaily revenueaggregatesper-minute revenue(windowed,exactly-once)fact_sales asIceberg/Delta(ACID, time travel,schema evolution)DAG: ingest_orders →dbt_run →quality_checktests · contracts ·lineage · PII onemailcost per query,build-vs-buyCh.6 ingest / CDCCh.2 stored asCh.7 dbt stagingCh.4 + Ch.7 modelingCh.3 queryCh.5 Spark batchCh.9 order_eventsstreamCh.10 lakehouseCh.8 orchestrationCh.11quality/governanceCh.12 scale & cost

4. The canonical names — the contract every chapter uses

So the dataset stays recognizable end to end, these names are fixed. When you see them later, they always mean what's defined here.

StageObjects
Source (OLTP)customers, products, orders, order_items
Streamorder_events (topic)
Raw landing (Ch. 6)raw.customers, raw.products, raw.orders, raw.order_items
Staging (Ch. 7, dbt)stg_customers, stg_products, stg_orders, stg_order_items
Marts / warehouse (Ch. 4 + 7)fact_sales, dim_customer, dim_product, dim_date, dim_store
Streaming output (Ch. 9)fact_revenue_1m (per-minute revenue)
Pipeline (Ch. 8)DAG shopflow_daily: ingest_orders → dbt_run → quality_check

:::note Durable vs dated ShopFlow's shape — sources → raw → staging → star schema, with a parallel event stream — is durable; it's how analytics data is built almost everywhere. The specific engines we run it on (a particular warehouse, Spark version, or dbt syntax) are dated examples, flagged as such where they appear. :::


→ Next: OLTP vs OLAP — why ShopFlow's app database and its analytics warehouse are built on opposite trade-offs, and why that gap is the reason your job exists.