Asset-based vs task-based orchestration
There are two fundamentally different ways to think about a pipeline, and the gap between them is the biggest conceptual shift in orchestration over the last few years. Most older guides teach only the first and never mention the second — so let's close that gap head-on. The two models are task-centric and asset-centric, and understanding both makes you fluent across Airflow, Dagster, and Prefect.
Two ways to describe the same pipeline
Picture ShopFlow's goal (ShopFlow — see Meet ShopFlow): the fact_sales table, built from raw.orders, which ingest_orders lands from the source database.
Task-centric (the classic model, e.g. Airflow). You describe the verbs — the steps to perform and their order. This is exactly the shopflow_daily DAG from the last lessons:
"Run
ingest_orders, thendbt_run, thenquality_check."
The orchestrator's job is to run those steps in order. The tables those steps produce — fact_sales, dim_customer — are an implicit side effect; nowhere does the system have a first-class concept of "the fact_sales table." If someone asks "what produces fact_sales and is it fresh?", the system can't answer directly; it only knows about tasks.
Asset-centric (the newer model, pioneered by Dagster). You describe the nouns — the data objects you want to exist and what each depends on:
"
fact_salesis an asset built from thestg_ordersasset, built from theraw.ordersasset, which is loaded from the source database."
A software-defined asset (SDA) is a declarative definition of a data object — a table, a file, an ML model — together with the code that produces it. You declare the asset and its upstream assets; the orchestrator derives the DAG from those dependencies.
Notice the inversion. In the task model you write the steps (ingest_orders → dbt_run → quality_check) and ShopFlow's tables are a byproduct. In the asset model you declare the tables — fact_sales and dim_customer as assets — and the steps (and their order) are derived. The DAG still exists — it's computed from asset dependencies instead of hand-wired.
Why the asset model caught on
Declaring data objects instead of steps unlocks things that are awkward in the pure task world:
- Lineage for free. Because every asset declares its upstream assets, the system knows the data lineage — "
fact_salescomes fromstg_orderscomes fromraw.orderscomes from the source database." You get a live dependency graph of ShopFlow's data, not just its jobs, which is gold for debugging and impact analysis. - "Is this table fresh?" is a first-class question. The system tracks each asset's last materialization, so freshness, staleness, and "what needs rebuilding?" are native concepts — "when was
fact_saleslast rebuilt?" is answerable directly. - Data-aware scheduling falls out naturally. If you've declared that
fact_salesdepends onraw.orders, then "rebuildfact_saleswhenraw.ordersupdates" is the obvious behavior — scheduling driven by data availability, not just the clock.
Materialize is the asset-world verb for "run the code that produces this asset's data," the asset analog of "run this task."
Asset-based (data-aware) scheduling triggers work when an upstream data object updates, rather than purely on a time schedule. It's the model behind Dagster assets and, increasingly, Airflow 3's assets/datasets.
Time-based vs data-aware scheduling
This is the practical heart of the shift, so make it concrete:
- Time-based: "Run
dbt_runat 2 a.m." But what ifingest_ordersis late landingraw.orders? You either buildfact_saleson stale data or pad with a guessed gap (the fragile cron pattern from lesson 8.1). - Data-aware: "Rebuild
fact_saleswheneverraw.ordershas new data." No guessing about timing — the downstream reacts to the upstream actually being ready.
Data-aware scheduling directly fixes the late-upstream problem that time gaps only paper over. This is the 2025–2026 industry direction: Dagster was asset-first from the start, and Airflow 3 added first-class assets/datasets (lesson 8.4) precisely to offer it. Even dbt fits naturally here — a dbt project is essentially a graph of data assets (models), which is why dbt and asset-based orchestrators pair so well.
When each model fits
Neither is universally "better" — they suit different shapes of work:
| Lean task-centric (Airflow-style) when… | Lean asset-centric (Dagster-style) when… |
|---|---|
| The pipeline is a sequence of operational actions (trigger a job, call an API, move a file) where the output isn't a neat table | The pipeline's whole purpose is producing and keeping data assets fresh (tables, files, features, models) |
| You have a large existing Airflow estate and ecosystem of operators | You want built-in lineage, freshness, and data-aware scheduling from day one |
| Steps don't map cleanly to "one task = one data object" | Work maps cleanly to "this code produces this dataset," and you value strong typing/testing of those datasets |
:::tip The convergence The lines are blurring fast. Airflow 3 brought assets to the task world; Dagster can still run plain ops/tasks when an action isn't a data object; Prefect stays flexible Python. So the real skill isn't picking a camp — it's thinking in both: "what data objects must exist (assets)?" and "what operational steps produce them (tasks)?" Fluency in both vocabularies is what makes you portable across tools. :::
Why it matters
Task-centric orchestration describes the verbs — the steps and their order — and treats the data tables as a byproduct; it's the classic Airflow model and fits operational, action-shaped pipelines. Asset-centric orchestration (Dagster's software-defined assets, now echoed by Airflow 3's assets) describes the nouns — the data objects you want to exist and their dependencies — and derives the DAG, giving you lineage, freshness, and data-aware scheduling (rebuild when upstream data updates) for free. The 2025–2026 shift is decisively toward data-aware/asset thinking, and the two models are converging, so the durable skill is to reason in both. Next, we close the loop on running pipelines reliably: retries, SLAs, smart triggering, and keeping heavy compute out of the orchestrator.