Skip to main content

Time and watermarks: event time, windows, and late data

Walk me through it

Picture a teacher collecting a timed quiz. Each answer sheet is stamped with when the student finished (say, 10:05). But the sheets don't reach the teacher's desk instantly — students hand them forward, some get stuck behind others, one falls under a desk and surfaces five minutes late. So there are two different times for every sheet: the time written on it (when the student finished) and the time the teacher actually receives it. If the teacher tallies "everyone who finished by 10:10" based on when sheets arrive at the desk, the answer is wrong — the late sheet finished at 10:08 but arrived at 10:15.

This gap — between when an event happened and when you process it — is the single biggest source of streaming bugs. This lesson names the different clocks, shows why you almost always want the "written-on-the-sheet" time, and introduces the clever trick — the watermark — that lets a system decide "okay, I've probably got all the 10:00–10:10 sheets now; I can publish that tally."

Three clocks, and why it matters which you use

Every event in a stream is associated with up to three different times:

  • Event timewhen the event actually happened, recorded inside the event by the source (the 10:08 stamped on the quiz). This is the truth about the real world.
  • Ingestion timewhen the event entered the streaming system (when the broker received it). Assigned by Kafka, not the source.
  • Processing timewhen your stream operator actually handled the event (wall-clock "now" inside the job). The 10:15 when the teacher picks up the sheet.

In a perfect world all three are equal. In reality, networks delay, mobile devices buffer events offline, partitions backlog, and consumers restart — so events arrive late (event time far behind processing time) and out of order (a 10:08 event arriving after a 10:09 event).

Which clock you compute on changes the answer. ShopFlow's "paid revenue per 10-minute window," where each paid event carries its order_events.ts:

  • By processing time, a paid event whose ts is 10:08 but that arrived at 10:15 gets counted in the 10:10–10:20 window — wrong bucket. Worse, the same input replayed later (after a backfill or a restart) produces different results, because processing time depends on when you happened to run.
  • By event time (the ts in the event), that paid counts in the 10:00–10:10 window where it belongs, and it does so no matter when you process it — reprocessing yesterday's data gives the same answer as the live run.

Almost always compute on event time. It's the only clock that reflects reality and the only one that makes results reproducible (the same events → the same answer, live or replayed). Processing time is acceptable only when you genuinely don't care when things happened (e.g. "how many events am I handling per second right now?").

:::warning The classic streaming bug A team writes a "sales in the last hour" job using processing time. It looks right in the demo. In production, a mobile client buffers events during a tunnel and flushes them 20 minutes later; those sales land in the wrong hour, and a backfill produces numbers that don't match the original run. The fix is almost always: switch to event time. If a guide teaches windowing without first nailing event vs processing time, watermarks will look like magic — this is the lesson that makes them make sense. :::

Windows: turning an endless stream into countable buckets

You can't SUM an infinite stream — there's no end (lesson 9.1). A window carves the unbounded stream into finite slices you can aggregate. Three shapes cover most needs:

  • Tumbling window — fixed size, non-overlapping, contiguous. [10:00–10:10), [10:10–10:20), …. Every event lands in exactly one window. Use for regular, non-overlapping reports ("revenue per 10 minutes").
events: e e e e e e e e e
time → |----------|----------|----------|
10:00 10:10 10:20 10:30
window 1 window 2 window 3 (each event in exactly one)
  • Sliding (hopping) window — fixed size that advances by a smaller step, so windows overlap and an event can fall in several. A 10-minute window that hops every 5 minutes → [10:00–10:10), [10:05–10:15), …. Use for smoothed, rolling metrics ("average over the last 10 minutes, updated every 5").
window A: [10:00 ----------- 10:10)
window B: [10:05 ----------- 10:15)
window C: [10:10 ----------- 10:20)
↑ an event at 10:07 falls in BOTH A and B
  • Session windowdynamic size defined by a gap of inactivity. Events for one key are grouped together as long as they're less than gap apart; a quiet stretch longer than the gap closes the session. Use for activity bursts ("one browsing session = clicks until 30 minutes of inactivity"). Session length isn't fixed — it's however long the burst lasted.
key=user-7: c c c c c c
|--session 1--| (gap > 30m) |--session 2--|

Windows are almost always keyed (lesson 9.2): "revenue per 10 minutes per store," "session per user." The engine maintains one window's running aggregate per key.

Watermarks: deciding when a window is "done"

Here's the puzzle windows create. A tumbling [10:00–10:10) window needs to emit a final sum. But because events arrive late and out of order, how does the system know it has seen all the 10:00–10:10 events and can safely publish the total? It can't wait forever (the stream never ends), and it can't trust processing-time "now" (that's the bug above).

The answer is the watermark. A watermark is a special marker that flows through the stream and asserts:

"Event time has now advanced to T. I do not expect any more events with a timestamp earlier than T."

When the watermark passes the end of a window, the system concludes "all events for this window have probably arrived" and emits the window's result. Watermarks are how event-time windows fire despite an unbounded, out-of-order stream.

How is the watermark computed? Typically: watermark = (largest event time seen so far) − (allowed out-of-orderness). That subtraction is a deliberate grace margin. If you've seen an event stamped 10:12 and you allow 2 minutes of out-of-orderness, the watermark is 10:10 — you're asserting "I've seen 10:12, so I'm now confident everything up to 10:10 has arrived." Set the margin to how late your data realistically gets.

The fundamental tradeoff:

  • A large margin = patient = you wait longer, so fewer events are "late," but every window's result is delayed by that margin.
  • A small margin = eager = you emit windows fast, but more genuinely-late events miss the cut.

Watermarks let you choose your point on the latency-vs-completeness dial.

Events arrive\n(outof order, by eventtime)Watermarkgenerator\nWM =max(event_time) −Windowoperator\n[10:00–10:10)Emit window resultWM passes 10:10 ⇒

A traced example: an event-time window with a late arrival

ShopFlow (see Meet ShopFlow) aggregates paid revenue into tumbling 10-minute windows, keyed on order_events.ts (the event time), allowed out-of-orderness = 2 minutes. paid events stream in (showing the ts carried in each event / arrival/processing order):

arrivests (event time)what the system does
1st10:01window [10:00–10:10) running revenue starts; max-seen = 10:01 → watermark 09:59
2nd10:06added to [10:00–10:10); max-seen = 10:06 → watermark 10:04
3rd10:09added to [10:00–10:10); max-seen = 10:09 → watermark 10:07
4th10:13starts window [10:10–10:20); max-seen = 10:13 → watermark 10:11 — passes 10:10!emit [10:00–10:10) result (sum of the three orders' amounts)
5th10:08late!a paid event whose ts 10:08 belongs to the already-emitted [10:00–10:10) window

The 5th event is late: a paid event that was delayed (a mobile checkout flushed after a dead-zone) so its window already fired. What happens to it depends on policy (next section). Notice the watermark did its job — the 10:13 event pushed event-time confidence past 10:10, which triggered the window. Processing-time "now" never entered into it, so replaying these same five events in any arrival order yields the same window contents (except the late one, governed by the same rules). That reproducibility is the whole point of event time — and it's exactly what makes ShopFlow's fact_revenue_1m output match whether computed live or backfilled.

Handling late and out-of-order data

When an event arrives after its window's watermark has passed, you have three options — and choosing wrong either loses data or never lets a window close:

  1. Drop it. Simplest; acceptable when a tiny fraction of stragglers won't change the decision. But silent dropping is dangerous — measure how much you drop.
  2. Allowed lateness — a grace period after the watermark during which a late event still updates the already-emitted window, producing a corrected, restated result. The window fires once at the watermark, then re-fires with an update if a late event lands within the grace period, then is finally purged. Use when correctness matters more than a single immutable answer (most accounting/analytics).
  3. Side output (a.k.a. late data stream) — route events later than the allowed lateness to a separate output channel instead of dropping them, so you can inspect, store, or reconcile them later. The combination — allowed lateness for the near-late, a side output for the very-late — is the production-grade pattern: nothing is silently lost.

Allowed lateness handles the merely-late by restating the window; a side output catches the too-late so it's never silently dropped. Together they make a streaming aggregate both timely and eventually correct.

:::tip A clean default Emit results at the watermark for timeliness, set allowed lateness to cover your realistic tail (so common stragglers restate the window), and attach a side output for anything beyond that (so true outliers are captured, not lost). Then alert on side-output volume — a spike means your watermark margin is too tight or a source is misbehaving. :::

Why it matters

Event time vs processing time is the concept that separates engineers who can build correct streaming pipelines from those who ship subtly-wrong ones. Compute on event time and your results are correct and reproducible; compute on processing time and the same data can yield different answers depending on when it happened to be processed. Windows make an unbounded stream aggregable; watermarks decide when a window is complete enough to emit, trading latency against completeness; and allowed lateness + side outputs turn "late data" from a silent corruptor into a managed, restated, never-lost flow. Every stream-processing engine in lesson 9.4 implements exactly these ideas.

Common pitfalls

  • Using processing time without realizing it. The default in some quick setups. It looks fine until late or replayed data exposes wrong buckets and non-reproducible results. Be explicit: assign event time from the event's own timestamp.
  • No watermark, or an unrealistic one. Too tight a margin drops a flood of "late" events; too loose and every window's result is needlessly delayed. Size the margin to observed lateness, and monitor it.
  • Silently dropping late data. The default for many pipelines, and invisible until an auditor finds missing revenue. Use allowed lateness and/or a side output, and measure the late volume.
  • Forgetting windows are keyed. "Sessions" or "per-10-min" almost always mean per entity; an unkeyed window mashes all keys together.
  • Sliding-window double counting. Overlapping windows intentionally place an event in several windows — correct for rolling averages, wrong if you treat each window's output as disjoint and sum them.

Checkpoint

Required checkpoint

9.3 Time & watermarks

Pass to unlock the Next button below

Going deeper: windows and watermarks are maintained as keyed state that must survive failures — that's Stateful processing. Correct, reproducible event-time results are also a precondition for exactly-once.

Foundations: windowing is keyed by the same keys you learned to partition on in Kafka architecture.

Next: Stateful processing →