Core Concepts and How Kafka Differs from a Queue
Broker, topic, partition, offset, replication, consumer groups, the log-vs-queue distinction, at-least-once delivery, and a Spring for Apache Kafka glossary.
This module gives the mental model its precise vocabulary. It is the glossary you return to for the rest of the course, and it maps every core Kafka term to its Spring for Apache Kafka equivalent so the later code modules feel familiar.
What you’ll be able to do after this module
- Define broker, topic, partition, offset, producer, consumer, and consumer group precisely.
- Explain replication, leaders and followers, and what a replication factor buys you.
- Explain how a Kafka log differs from a traditional message queue.
- Explain why Kafka delivers at least once by default and what that demands of your code.
- Map each core Kafka term to its Spring for Apache Kafka equivalent.
1. The cluster hierarchy
Kafka’s structure is a small nesting of concepts. A cluster holds brokers; brokers host partitions; partitions make up topics; each partition is replicated for safety.
flowchart TD
subgraph cluster [Kafka Cluster]
subgraph b1 [Broker 1]
t0p0["orders p0 (leader)"]
t0p1f["orders p1 (follower)"]
end
subgraph b2 [Broker 2]
t0p1["orders p1 (leader)"]
t0p0f["orders p0 (follower)"]
end
end
- Cluster: the whole set of Kafka brokers working together.
- Broker: a single Kafka server. It stores partition data and serves producers and consumers.
- Topic: a named, logical stream of events, such as
orders. - Partition: an ordered, append-only slice of a topic; the unit of parallelism and ordering.
- Replica: a copy of a partition on another broker for fault tolerance.
You go deeper on broker internals, leaders, and replicas in Cluster Anatomy.
2. Offsets, producers, and consumers
- Offset: the position of an event within a partition. It is a per-partition, monotonically increasing number.
- Producer: code that publishes (appends) events to a topic.
- Consumer: code that reads events from a topic’s partitions.
- Consumer group: a set of consumer instances that share the reading of a topic; each partition is assigned to exactly one instance in the group, and each group tracks its own offsets.
The key behavior to remember from the mental model: producers append, consumers read by moving an offset pointer forward, and reading never deletes.
3. Replication: leaders, followers, and the ISR
Every partition has one leader replica and zero or more follower replicas on other brokers. All reads and writes for a partition go through its leader; followers copy the leader’s log to stay current.
flowchart LR
Producer["Producer"]
subgraph part [orders partition 0]
leader["Leader (Broker 1)"]
f1["Follower (Broker 2)"]
f2["Follower (Broker 3)"]
end
Producer -->|write| leader
leader -->|replicate| f1
leader -->|replicate| f2
- Replication factor: how many copies of each partition exist. A factor of 3 means one leader and two followers, so the partition survives losing brokers.
- In-sync replica set (ISR): the replicas that are fully caught up with the leader. If the leader fails, a broker from the ISR is promoted, so no acknowledged data is lost.
This is the foundation of Kafka’s durability, and it connects directly to producer acks and min.insync.replicas, which you tune in Reliable Producing. Full treatment is in Cluster Anatomy.
4. How Kafka differs from a traditional queue
If you have used RabbitMQ or SQS, this table is the mental switch to make.
| Aspect | Traditional queue | Kafka |
|---|---|---|
| On read | Message is removed | Event stays; the consumer advances an offset |
| Replay | Not built in | Any consumer can re-read from an earlier offset |
| Fan-out | Usually needs one queue per consumer | Every consumer group reads the same log independently |
| Ordering | Often global per queue | Per partition only, driven by the key |
| Scaling reads | Add consumers to one queue | Add partitions and consumers to one group |
| Retention | Until consumed | Time or size based, independent of consumption |
The shift in one sentence: Kafka is a durable log you read from, not a mailbox you drain.
5. Delivery is at least once by default
This is the most important behavioral fact for your code. By default, Kafka and its consumers are configured so an event is processed at least once. A consumer can therefore see the same event more than once, for example if it processes an event but crashes before committing its offset.
sequenceDiagram
participant K as orders partition 0
participant C as Inventory consumer
K->>C: deliver event at offset 7
C->>C: reserve stock (succeeds)
Note over C: crashes before committing offset 8
K->>C: redeliver event at offset 7
Note over C: same event processed twice
The implication is a design rule you will meet repeatedly: consumer handlers must be idempotent. Processing the same event twice must not reserve stock twice or charge a customer twice. Kafka does not solve this for you; it is a property of your application code. Exactly-once semantics narrow the window for the Kafka-to-Kafka case, but idempotency remains the durable strategy. Strategies are covered in Idempotent Consumers, Ordering, and Duplicates, and the guarantees themselves in Delivery Guarantees.
6. The core terms, mapped to Spring for Apache Kafka
This glossary is your quick reference. Each term links to the module where it is covered in depth.
| Concept | What it means | Spring for Apache Kafka equivalent |
|---|---|---|
| Producer | Code that publishes events. | KafkaTemplate |
| Consumer | Code that reads and processes events. | @KafkaListener method |
| Topic | A named, partitioned log of events. More detail in the Mental Model. | referenced by name; created via NewTopic / KafkaAdmin |
| Partition | An ordered slice of a topic; the unit of ordering and parallelism. | container concurrency maps consumer threads to partitions |
| Offset | An event’s position within a partition. | committed via AckMode; auto.offset.reset sets where to start |
| Key | Determines which partition an event lands in. | key argument to kafkaTemplate.send(topic, key, value) |
| Consumer group | Instances sharing a topic’s partitions, with their own offsets. | spring.kafka.consumer.group-id or @KafkaListener(groupId=...) |
| Replication factor | Number of copies of each partition. | topic config (set on NewTopic or the broker) |
| In-sync replicas (ISR) | Replicas caught up with the leader. | pairs with producer acks=all and min.insync.replicas |
| Serializer / Deserializer | Turns objects into bytes and back. | JsonSerializer / JsonDeserializer, ErrorHandlingDeserializer |
| Broker | A single Kafka server storing partitions. | spring.kafka.bootstrap-servers |
| Dead letter topic (DLT) | Where un-processable records are routed. | @RetryableTopic / DeadLetterPublishingRecoverer |
7. A first look at the configuration
You have not written code yet, but here is the smallest Spring Boot configuration so the vocabulary maps to something concrete. You build on this in First Producer and Consumer.
spring:
kafka:
bootstrap-servers: localhost:9092 # broker address
consumer:
group-id: payment-service # this app's consumer group
auto-offset-reset: earliest # where to start with no committed offset
Every key here is a core concept: the broker address, the consumer group that owns offsets, and where to begin reading. Nothing new, just the glossary in YAML form.
Checkpoint
You should now be able to:
- Define broker, topic, partition, offset, producer, consumer, and consumer group.
- Explain replication factor, leader vs follower, and the ISR in one or two sentences each.
- Explain the core difference between a Kafka log and a traditional queue.
- Explain why Kafka is at least once by default and why that forces idempotent consumers.
- Map producer, consumer, topic, offset, and consumer group to their Spring for Apache Kafka equivalents.
Next: Section 1, Cluster Anatomy, where brokers, partitions, and replicas become a working architecture. (Coming next in the build.)