The Kafka Mental Model: The Log
Topics as partitioned append-only logs, offsets, per-partition ordering, retention, non-destructive reads, and independent consumer groups.
This module builds the single most important idea in Kafka: a topic is a partitioned, append-only, immutable log. Get this right and everything later, from ordering to consumer groups to exactly-once, follows naturally. It stays conceptual on purpose; you write code against these ideas starting in the Building section.
What you’ll be able to do after this module
- Explain what an append-only log is and why Kafka is built on one.
- Describe what a partition is and why Kafka splits a topic into partitions.
- Define an offset and explain what ordering guarantee Kafka actually provides.
- Explain retention and why reading does not delete events.
- Explain how two consumer groups can read the same topic independently.
1. A topic is an append-only log
Start with the simplest version: ignore partitions for a moment and picture a topic as a single log. Producers only ever append to the end. Existing entries are never changed and never reordered.
flowchart LR
p["Producer"]
subgraph log [orders topic - one log]
direction LR
e0["offset 0"]
e1["offset 1"]
e2["offset 2"]
e3["offset 3"]
e0 --> e1 --> e2 --> e3
end
p -->|append| e3
Each entry gets a monotonically increasing number as it lands. That number is its offset: its fixed position in the log. Offset 3 is always offset 3; it does not move. This immutability is what makes Kafka a reliable source of truth: once an event is written, every reader sees the same sequence.
2. Partitions: how one topic scales
A single log lives on a single machine, which caps throughput and storage. Kafka solves this by splitting a topic into partitions. Each partition is itself an independent append-only log, and different partitions can live on different brokers.
flowchart TD
subgraph topic [orders topic]
subgraph p0 [partition 0]
a0["0"] --> a1["1"] --> a2["2"]
end
subgraph p1 [partition 1]
b0["0"] --> b1["1"] --> b2["2"]
end
subgraph p2 [partition 2]
c0["0"] --> c1["1"]
end
end
Two consequences matter immediately:
- Parallelism: more partitions mean more consumers can read a topic at the same time, one working each partition. This is how Kafka scales horizontally.
- Offsets are per partition: notice each partition has its own offset 0, 1, 2. An offset is only meaningful together with its partition. “orders, partition 1, offset 2” identifies exactly one event.
How does Kafka decide which partition an event goes to? By the event’s key. All events with the same key (for example, the same orderId) go to the same partition. Events with no key are spread across partitions. You explore keys and partitioning in depth in Producing Deeper.
3. Ordering: guaranteed per partition, not per topic
This is the rule that surprises people, so state it precisely:
Kafka guarantees ordering within a single partition, not across a whole topic.
flowchart LR
subgraph p0 [partition 0]
x0["orderId=A<br/>Created"] --> x1["orderId=A<br/>Paid"]
end
subgraph p1 [partition 1]
y0["orderId=B<br/>Created"] --> y1["orderId=B<br/>Paid"]
end
Because all events for orderId=A share a key, they land in the same partition in the order they were produced, so a consumer always sees Created before Paid for order A. But there is no global ordering guarantee between order A (partition 0) and order B (partition 1); they are independent logs.
The practical rule: choose a key that groups the events that must stay ordered. If per-order ordering matters, key by orderId. Ordering, keys, and partition count are covered again in Idempotent Consumers, Ordering, and Duplicates.
4. Retention: reading does not delete
In a traditional queue, a message is removed once it is consumed. Kafka is different: events stay in the log for a configured retention period, regardless of who has read them.
Retention is configured per topic, by time or by size:
- Time-based: keep events for 7 days (the common default), then delete the oldest.
- Size-based: keep up to N gigabytes per partition, then delete the oldest.
flowchart LR
subgraph before [Retention window]
o0["offset 0<br/>(old)"] --> o1["offset 1"] --> o2["offset 2"] --> o3["offset 3<br/>(newest)"]
end
note["Events age out from the oldest end<br/>after the retention period"]
Because reading is non-destructive, Kafka is a durable record of what happened, not just a transient pipe. A new service can join later and read history from the start, and an existing consumer can re-read events it already processed. This is the foundation of replay.
5. Consumer position: the offset pointer
If events are not deleted on read, how does a consumer know where it left off? It tracks its own committed offset: a pointer into each partition marking the next event to read.
sequenceDiagram
participant C as Payment consumer
participant P as orders partition 0
P-->>C: read offset 5
C->>C: process event
C->>P: commit offset 6 (next to read)
Note over C,P: On restart, the consumer resumes at offset 6
The consumer, not the broker, drives progress by moving this pointer forward. If it crashes and restarts, it resumes from the last committed offset. Committing too early or too late is exactly what produces at-least-once or at-most-once behavior, which you study in Consumer Groups and Offsets and Delivery Guarantees.
6. Consumer groups: independent readers vs shared work
Two different needs are both solved by the consumer group:
- Fan-out across teams: each application uses its own group id. Every group gets its own set of offsets, so the Payment group and the Notification group each read every event independently.
- Sharing work within an application: within one group, Kafka assigns each partition to exactly one consumer instance, so adding instances spreads the load.
flowchart TD
subgraph topic [orders topic]
pa["partition 0"]
pb["partition 1"]
end
subgraph gp [Payment group]
pc1["instance 1"]
pc2["instance 2"]
end
subgraph gn [Notification group]
nc1["instance 1"]
end
pa --> pc1
pb --> pc2
pa --> nc1
pb --> nc1
In the diagram, the Payment group has two instances, so the two partitions are split between them (parallel work). The Notification group has one instance, so it reads both partitions itself. Both groups see every event, because each group tracks its own offsets. This is the mechanism behind both fan-out and horizontal scaling, and you return to it in Consumer Groups and Offsets.
7. Putting the model together
Hold these five facts and the rest of Kafka becomes predictable:
- A topic is an append-only, immutable log, split into partitions.
- An offset identifies an event’s position within one partition.
- Ordering is guaranteed per partition only, driven by the event key.
- Retention keeps events after they are read; reading is non-destructive.
- Consumer groups track their own offsets, giving both fan-out and shared work.
Checkpoint
You should now be able to:
- Explain what an append-only, immutable log is and why offsets never change.
- Describe why a topic is split into partitions and what that buys you.
- State Kafka’s ordering guarantee precisely and explain the role of the key.
- Explain retention and why reading does not delete an event.
- Explain how two consumer groups read the same topic independently, and how instances within one group share the work.
Next:Core Concepts, where these ideas get their precise names and a glossary that maps each to Spring for Apache Kafka.