Skip to main content

Three traditions: Kimball, Inmon & Data Vault

You now know how to build a star schema. But step back: how should an entire enterprise warehouse be organized? The answer is genuinely contested, and three traditions have dominated for decades. You'll see all three named in job descriptions and architecture docs — Data Vault in particular shows up constantly in enterprise and regulated industries, and most tutorials pretend it doesn't exist. This lesson fixes that. To compare them fairly, we first need the idea they disagree about: normalization.

Normalization vs denormalization

Normalization is organizing data to eliminate redundancy by splitting it into many small, related tables, so every fact is stored exactly once. Its canonical target is third normal form (3NF) — the design philosophy of operational app databases (Chapter 1's OLTP). If a customer's city is stored in one customers table and referenced by ID everywhere else, then when they move you update one row and the whole system is instantly consistent. Normalization optimizes for write correctness and integrity — exactly what an app needs.

Denormalization is the opposite: deliberately duplicating data so it's pre-joined and ready to read. A star-schema dimension that repeats category_name on every product row is denormalized. You accept redundancy and harder writes in exchange for fewer joins and faster, simpler reads — exactly what analytics needs.

The whole debate below is really where in the warehouse you draw the line between normalized (integrity-first) and denormalized (query-first) — and how much you privilege long-term flexibility over immediate usability.

Kimball — dimensional, bottom-up

Ralph Kimball's method is the one you've been learning: model the warehouse as star schemas — facts and denormalized dimensions — organized by business process, tied together by conformed dimensions. It's bottom-up: you deliver one business process's star (e.g. sales) quickly, then another (e.g. inventory), and conformed dimensions stitch them into a coherent whole (Kimball calls this the bus architecture).

  • Optimizes for: query simplicity and speed; fast time-to-value; analyst- and BI-friendliness.
  • Strength: business users understand it; you ship value early; it's the native shape for dashboards.
  • Weakness: without disciplined conformance, independently-built stars can drift apart; less of a single integrated "system of record" than Inmon.
  • You'll see it: nearly everywhere analysts and BI tools touch data — it's the default for the serving layer of a warehouse.

Inmon — normalized, top-down (the Corporate Information Factory)

Bill Inmon's method goes the other way. Build, first, a single enterprise-wide, normalized (3NF) integrated repository — the Corporate Information Factory (CIF) — that is the organization's system of record: every subject area integrated, deduplicated, and consistent. Then spin dimensional data marts (Kimball-style stars) off that central repository to serve specific departments.

  • Optimizes for: enterprise-wide integration, consistency, and a single normalized source of truth.
  • Strength: one integrated, non-redundant core; strong governance; data marts derive from a consistent base, so they can't disagree.
  • Weakness: top-down and slow to deliver value — you must integrate broadly before serving anyone; more upfront design; the normalized core isn't directly query-friendly.
  • You'll see it: large, mature enterprises that prize a governed central source of truth over speed.
SourcesConformed starschemas\n(busarchitecture)Sources3NF enterprisecore\n(CorporateInformation Factory)

The classic framing: Kimball is bottom-up and serve-first; Inmon is top-down and integrate-first. They are not really enemies — many shops use an Inmon-style integrated core feeding Kimball-style marts. The marts users actually query are dimensional either way.

Data Vault — built for change and auditability

Data Vault (Dan Linstedt) is a third pattern designed for a different priority: agility, auditability, and surviving constant change in source systems. Instead of facts/dimensions or pure 3NF, it decomposes everything into three building blocks:

  • Hubs — a list of business keys for a core concept, and almost nothing else. hub_customer holds the customer business key (e.g. customer number) + a surrogate key + load metadata. A hub answers "which customers exist?" and never changes its meaning.
  • Linksrelationships between hubs. link_order connects hub_customer, hub_product, and hub_store — it records that a relationship occurred, as its own row.
  • Satellites — the descriptive attributes and their history, hanging off a hub or link. sat_customer_details holds the customer's name, city, segment, with load timestamps, and stores every version over time (history is inherent, like a built-in SCD Type 2). Attributes that change at different rates get different satellites.
hub_customer\n(business key only)hub_productlink_order\n(relatesthe hubs)sat_customer_details\n(name, city + fullhistory)sat_product_details

Why anyone accepts this extra complexity:

  • Auditability by construction. Every record carries its load source and timestamp; satellites keep all history. You can always answer "what did we know, from which source, as of when?" — gold for regulated industries (finance, healthcare, insurance).
  • Resilience to source change. Adding a new source or attribute means adding a new satellite or link, not redesigning existing tables. The model absorbs change without breaking what's already there.
  • Parallel loading. Hubs, links, and satellites can be loaded independently, which suits large, fast-changing source landscapes.

The cost is real: far more tables and joins, and it is not what users query directly. Data Vault is a raw integration layer; you still build dimensional (Kimball) marts on top of it for actual analytics. Think of it as a highly auditable, change-resilient core — a modern alternative to Inmon's 3NF core — that feeds star schemas.

:::note Data Vault isn't optional knowledge Many guides skip Data Vault, then candidates get blindsided when an enterprise role lists it as a requirement. You don't need to love it, but you must be able to say what hubs, links, and satellites are and why a regulated enterprise chooses it: auditability and resilience to source change, at the cost of complexity — as an integration core that still feeds Kimball marts. :::

So which one?

They are not mutually exclusive, and the modern answer is usually layered:

TraditionCore shapeOptimizes forTypical home in a modern stack
KimballStar schemas, conformed dimsQuery speed, usability, fast valueThe serving / mart layer users query
Inmon3NF enterprise core → martsEnterprise integration & consistencyA governed integrated core feeding marts
Data VaultHubs / links / satellitesAuditability, change-resilienceA raw integration core feeding marts

The thing users actually query is almost always a Kimball-style star (or a wide table — next lesson), regardless of what feeds it. Inmon and Data Vault are competing philosophies for the integration core beneath that serving layer. In the cloud-warehouse + dbt world (Chapter 7), this layering is expressed as the medallion architecture — the subject of the next lesson, which also revisits whether the star schema is even the right serving shape anymore.

Why it matters

The three traditions disagree about where to draw the line between normalized (integrity-first) and denormalized (query-first) data, and how much to privilege flexibility over immediate usability. Kimball models the warehouse as denormalized star schemas tied by conformed dimensions — bottom-up, serve-first, the default for the layer users query. Inmon builds a normalized 3NF enterprise core (the Corporate Information Factory) first, then spins dimensional marts off it — top-down, integrate-first, consistency over speed. Data Vault decomposes data into hubs (business keys), links (relationships), and satellites (attributes + full history) for maximum auditability and resilience to source change — heavy on tables and joins, used as an integration core in regulated enterprises. Critically they layer rather than compete: an Inmon or Data Vault core feeds Kimball marts, and what users query is almost always a star schema anyway. Next, we map all of this onto the modern cloud stack — medallion layers, the One-Big-Table debate, semantic layers, and idempotent loading.

Next: Modern modeling in the cloud →