Fact table types & additive measures
In the last lesson, every fact table recorded an event — a sale happened, a ticket was filed. But not every business question is about events. Some are about states over time ("how much inventory did we hold each day?") and some are about processes with a lifecycle ("how long did this order take from placed to delivered?"). Kimball identified three fact-table types that cover these cases. Picking the right one is part of declaring the grain — and using the wrong one makes whole classes of question awkward or impossible.
Then we'll cover a quieter but equally dangerous topic: which measures you're allowed to add up, across which dimensions. Summing the wrong measure produces numbers that look fine and are completely wrong.
The three fact-table types
1. Transaction fact table
The default. One row per event, at the moment it happens. A sale, a click, a payment, a support ticket. This is ShopFlow's fact_sales from the last lesson — one row per order line item. Transaction facts are the most flexible and atomic — they record what happened at the finest grain — and most of your fact tables will be this type.
- Grain: one row per transaction (or transaction line).
- Grows: continuously, one row per event, forever.
- Answers: "How many sales? What revenue? When, where, by whom?"
2. Periodic snapshot fact table
Some things aren't events — they're levels you sample on a regular cadence. Account balance, inventory on hand, number of active subscribers. There's no single "event" for "the balance is $500"; the balance simply is $500 right now. So you take a snapshot at a fixed interval (every day, every month) and record the state.
- Grain: one row per thing per period (e.g. one row per account per day).
- Grows: predictably — every account gets a new row every day, whether or not anything changed.
- Answers: "What was inventory on hand each day? Month-end balance per account?"
The key insight: a periodic snapshot answers trend-over-time questions that a transaction table answers only with painful running-sum gymnastics. If you constantly need "balance as of date X," store the snapshot.
3. Accumulating snapshot fact table
Some business processes have a defined lifecycle with milestones: a ShopFlow order is placed, then paid, then shipped (the status values you met in orders). An accumulating snapshot has one row per process instance (one row per order_id), and that row is updated in place as the order moves through its milestones. The table typically has multiple date columns — placed_date, paid_date, shipped_date, delivered_date — that start null and fill in over time, plus lag measures like days_to_ship.
- Grain: one row per process instance (one row per order across its whole life).
- Grows: one row per process; rows are updated, not just inserted.
- Answers: "How long between paid and shipped? What's our average fulfilment time? How many orders are stuck at 'paid' but not 'shipped'?"
This is the one type that updates existing rows rather than only appending — which matters for the loading patterns in lesson 4.6.
:::tip How to choose Ask what the question is about. An event happened? → transaction. A level at a point in time, sampled regularly? → periodic snapshot. A multi-step process you want to measure milestone-to-milestone durations on? → accumulating snapshot. Many warehouses use all three for the same business — e.g. transaction-level sales, daily inventory snapshots, and an accumulating order-fulfilment table. :::
Additivity: the measure trap
Now the subtle, dangerous part. A measure is a numeric column in a fact table. The whole point of a fact table is that you SUM measures across dimensions. But not every measure can be summed across every dimension. Measures come in three flavors of additivity.
Additive measures — sum across anything
An additive measure can be summed across every dimension. ShopFlow's line_revenue and quantity are additive: total revenue across all dates, all products, all stores, all customers is just SUM(line_revenue), and it's meaningful no matter how you slice. Additive measures are the easy, safe ones, and good fact design tries to make as many measures additive as possible.
Semi-additive measures — sum across some dimensions, NOT time
A semi-additive measure can be summed across some dimensions but not across time. The classic example is a balance (or any "level" measure in a periodic snapshot): an account balance, inventory on hand, headcount.
Trace the trap. Your daily inventory snapshot says you held 100 units on Monday, 100 on Tuesday, 100 on Wednesday (you sold and restocked evenly). The true inventory level is 100 the whole week. But SUM(inventory_on_hand) across those three days gives 300 — a number that is physically meaningless. You never had 300 units.
Why? Because a balance isn't a flow you accumulate — it's a snapshot of a state. You can add inventory across products at one point in time ("total units across all SKUs today"), but you cannot add it across time. Across the time dimension you must use AVG, or take the last value (end-of-period balance) — never SUM.
:::warning Semi-additive measures are a silent wrong-number generator
Summing a balance or any "on hand / level / count-at-a-moment" measure across the date dimension produces a number that looks legitimate and is nonsense. Whenever a measure answers "how much is there right now" (rather than "how much happened"), flag it as semi-additive: additive across other dimensions, but AVG or last-value across time. This bug ships to dashboards constantly because nothing errors — the total is just wrong.
:::
Non-additive measures — can't sum across anything
A non-additive measure can't be meaningfully summed across any dimension. Ratios and percentages are the usual culprits: profit_margin_pct, conversion_rate, unit_price. Summing percentages is obvious nonsense (a 20% margin plus a 30% margin is not a 50% margin).
The fix is a discipline that pays off everywhere: store the additive components, compute the ratio at query time. Don't store profit_margin_pct; store additive revenue and additive cost, then compute SUM(revenue - cost) / SUM(revenue) in the query (or, better, in the semantic layer — lesson 4.6). The components add up correctly across every dimension; the ratio is derived last. This single habit prevents a huge fraction of "the rolled-up percentage is wrong" bugs.
| Measure type | Sum across non-time dims? | Sum across time? | Examples |
|---|---|---|---|
| Additive | Yes | Yes | revenue, quantity, cost |
| Semi-additive | Yes | No (use AVG / last) | inventory on hand, account balance, headcount |
| Non-additive | No | No | margin %, conversion rate, unit price |
Why it matters
There are three fact-table types, and choosing the right one is part of declaring the grain: a transaction fact records one row per event (the flexible default); a periodic snapshot records one row per thing per period for "level over time" questions; an accumulating snapshot records one row per multi-step process and updates it in place as milestones complete (the one type that updates rows). Independently, every measure has an additivity: additive measures (revenue, quantity) sum safely across all dimensions; semi-additive measures (balances, inventory, headcount) sum across everything except time — summing them across dates is a classic silent wrong-number bug, use AVG or last-value instead; non-additive measures (ratios, percentages) can't be summed at all, so you store the additive components and compute the ratio at query time. Knowing which measure you can add up, across which dimension, is the difference between a dashboard people trust and one that quietly lies. Next: what happens when the dimensions themselves change over time — slowly changing dimensions.