Kafka Core Concepts: Vocabulary and the Log-vs-Queue Model

This module gives the mental model its precise vocabulary. It is the glossary you return to for the rest of the course, and it maps every core Kafka term to its Spring for Apache Kafka equivalent so the later code modules feel familiar.

What you’ll be able to do after this module

Define broker, topic, partition, offset, producer, consumer, and consumer group precisely.
Explain replication, leaders and followers, and what a replication factor buys you.
Explain how a Kafka log differs from a traditional message queue.
Explain why Kafka delivers at least once by default and what that demands of your code.
Map each core Kafka term to its Spring for Apache Kafka equivalent.

1. The cluster hierarchy

Kafka’s structure is a small nesting of concepts. A cluster holds brokers; brokers host partitions; partitions make up topics; each partition is replicated for safety.

flowchart TD
    subgraph cluster [Kafka Cluster]
        subgraph b1 [Broker 1]
            t0p0["orders p0 (leader)"]
            t0p1f["orders p1 (follower)"]
        end
        subgraph b2 [Broker 2]
            t0p1["orders p1 (leader)"]
            t0p0f["orders p0 (follower)"]
        end
    end

Cluster: the whole set of Kafka brokers working together.
Broker: a single Kafka server. It stores partition data and serves producers and consumers.
Topic: a named, logical stream of events, such as orders.
Partition: an ordered, append-only slice of a topic; the unit of parallelism and ordering.
Replica: a copy of a partition on another broker for fault tolerance.

You go deeper on broker internals, leaders, and replicas in Cluster Anatomy.

2. Offsets, producers, and consumers

Offset: the position of an event within a partition. It is a per-partition, monotonically increasing number.
Producer: code that publishes (appends) events to a topic.
Consumer: code that reads events from a topic’s partitions.
Consumer group: a set of consumer instances that share the reading of a topic; each partition is assigned to exactly one instance in the group, and each group tracks its own offsets.

The key behavior to remember from the mental model: producers append, consumers read by moving an offset pointer forward, and reading never deletes.

3. Replication: leaders, followers, and the ISR

Every partition has one leader replica and zero or more follower replicas on other brokers. All reads and writes for a partition go through its leader; followers copy the leader’s log to stay current.

flowchart LR
    Producer["Producer"]
    subgraph part [orders partition 0]
        leader["Leader (Broker 1)"]
        f1["Follower (Broker 2)"]
        f2["Follower (Broker 3)"]
    end
    Producer -->|write| leader
    leader -->|replicate| f1
    leader -->|replicate| f2

Replication factor: how many copies of each partition exist. A factor of 3 means one leader and two followers, so the partition survives losing brokers.
In-sync replica set (ISR): the replicas that are fully caught up with the leader. If the leader fails, a broker from the ISR is promoted, so no acknowledged data is lost.

This is the foundation of Kafka’s durability, and it connects directly to producer acks and min.insync.replicas, which you tune in Reliable Producing. Full treatment is in Cluster Anatomy.

4. How Kafka differs from a traditional queue

If you have used RabbitMQ or SQS, this table is the mental switch to make.

Aspect	Traditional queue	Kafka
On read	Message is removed	Event stays; the consumer advances an offset
Replay	Not built in	Any consumer can re-read from an earlier offset
Fan-out	Usually needs one queue per consumer	Every consumer group reads the same log independently
Ordering	Often global per queue	Per partition only, driven by the key
Scaling reads	Add consumers to one queue	Add partitions and consumers to one group
Retention	Until consumed	Time or size based, independent of consumption

The shift in one sentence: Kafka is a durable log you read from, not a mailbox you drain.

5. Delivery is at least once by default

This is the most important behavioral fact for your code. By default, Kafka and its consumers are configured so an event is processed at least once. A consumer can therefore see the same event more than once, for example if it processes an event but crashes before committing its offset.

sequenceDiagram
    participant K as orders partition 0
    participant C as Inventory consumer

    K->>C: deliver event at offset 7
    C->>C: reserve stock (succeeds)
    Note over C: crashes before committing offset 8
    K->>C: redeliver event at offset 7
    Note over C: same event processed twice

The implication is a design rule you will meet repeatedly: consumer handlers must be idempotent. Processing the same event twice must not reserve stock twice or charge a customer twice. Kafka does not solve this for you; it is a property of your application code. Exactly-once semantics narrow the window for the Kafka-to-Kafka case, but idempotency remains the durable strategy. Strategies are covered in Idempotent Consumers, Ordering, and Duplicates, and the guarantees themselves in Delivery Guarantees.

6. The core terms, mapped to Spring for Apache Kafka

This glossary is your quick reference. Each term links to the module where it is covered in depth.

Concept	What it means	Spring for Apache Kafka equivalent
Producer	Code that publishes events.	`KafkaTemplate`
Consumer	Code that reads and processes events.	`@KafkaListener` method
Topic	A named, partitioned log of events. More detail in the Mental Model.	referenced by name; created via `NewTopic` / `KafkaAdmin`
Partition	An ordered slice of a topic; the unit of ordering and parallelism.	container `concurrency` maps consumer threads to partitions
Offset	An event’s position within a partition.	committed via `AckMode`; `auto.offset.reset` sets where to start
Key	Determines which partition an event lands in.	key argument to `kafkaTemplate.send(topic, key, value)`
Consumer group	Instances sharing a topic’s partitions, with their own offsets.	`spring.kafka.consumer.group-id` or `@KafkaListener(groupId=...)`
Replication factor	Number of copies of each partition.	topic config (set on `NewTopic` or the broker)
In-sync replicas (ISR)	Replicas caught up with the leader.	pairs with producer `acks=all` and `min.insync.replicas`
Serializer / Deserializer	Turns objects into bytes and back.	`JsonSerializer` / `JsonDeserializer`, `ErrorHandlingDeserializer`
Broker	A single Kafka server storing partitions.	`spring.kafka.bootstrap-servers`
Dead letter topic (DLT)	Where un-processable records are routed.	`@RetryableTopic` / `DeadLetterPublishingRecoverer`

7. A first look at the configuration

You have not written code yet, but here is the smallest Spring Boot configuration so the vocabulary maps to something concrete. You build on this in First Producer and Consumer.

spring:
  kafka:
    bootstrap-servers: localhost:9092   # broker address
    consumer:
      group-id: payment-service         # this app's consumer group
      auto-offset-reset: earliest        # where to start with no committed offset

Every key here is a core concept: the broker address, the consumer group that owns offsets, and where to begin reading. Nothing new, just the glossary in YAML form.

Checkpoint

You should now be able to:

Define broker, topic, partition, offset, producer, consumer, and consumer group.
Explain replication factor, leader vs follower, and the ISR in one or two sentences each.
Explain the core difference between a Kafka log and a traditional queue.
Explain why Kafka is at least once by default and why that forces idempotent consumers.
Map producer, consumer, topic, offset, and consumer group to their Spring for Apache Kafka equivalents.

Next: Section 1, Cluster Anatomy, where brokers, partitions, and replicas become a working architecture. (Coming next in the build.)