Read time: ~

Core Concepts and How Kafka Differs from a Queue

Broker, topic, partition, offset, replication, consumer groups, the log-vs-queue distinction, at-least-once delivery, and a Spring for Apache Kafka glossary.

This module gives the mental model its precise vocabulary. It is the glossary you return to for the rest of the course, and it maps every core Kafka term to its Spring for Apache Kafka equivalent so the later code modules feel familiar.


What you’ll be able to do after this module

  • Define broker, topic, partition, offset, producer, consumer, and consumer group precisely.
  • Explain replication, leaders and followers, and what a replication factor buys you.
  • Explain how a Kafka log differs from a traditional message queue.
  • Explain why Kafka delivers at least once by default and what that demands of your code.
  • Map each core Kafka term to its Spring for Apache Kafka equivalent.

1. The cluster hierarchy

Kafka’s structure is a small nesting of concepts. A cluster holds brokers; brokers host partitions; partitions make up topics; each partition is replicated for safety.

flowchart TD
    subgraph cluster [Kafka Cluster]
        subgraph b1 [Broker 1]
            t0p0["orders p0 (leader)"]
            t0p1f["orders p1 (follower)"]
        end
        subgraph b2 [Broker 2]
            t0p1["orders p1 (leader)"]
            t0p0f["orders p0 (follower)"]
        end
    end
  • Cluster: the whole set of Kafka brokers working together.
  • Broker: a single Kafka server. It stores partition data and serves producers and consumers.
  • Topic: a named, logical stream of events, such as orders.
  • Partition: an ordered, append-only slice of a topic; the unit of parallelism and ordering.
  • Replica: a copy of a partition on another broker for fault tolerance.

You go deeper on broker internals, leaders, and replicas in Cluster Anatomy.


2. Offsets, producers, and consumers

  • Offset: the position of an event within a partition. It is a per-partition, monotonically increasing number.
  • Producer: code that publishes (appends) events to a topic.
  • Consumer: code that reads events from a topic’s partitions.
  • Consumer group: a set of consumer instances that share the reading of a topic; each partition is assigned to exactly one instance in the group, and each group tracks its own offsets.

The key behavior to remember from the mental model: producers append, consumers read by moving an offset pointer forward, and reading never deletes.


3. Replication: leaders, followers, and the ISR

Every partition has one leader replica and zero or more follower replicas on other brokers. All reads and writes for a partition go through its leader; followers copy the leader’s log to stay current.

flowchart LR
    Producer["Producer"]
    subgraph part [orders partition 0]
        leader["Leader (Broker 1)"]
        f1["Follower (Broker 2)"]
        f2["Follower (Broker 3)"]
    end
    Producer -->|write| leader
    leader -->|replicate| f1
    leader -->|replicate| f2
  • Replication factor: how many copies of each partition exist. A factor of 3 means one leader and two followers, so the partition survives losing brokers.
  • In-sync replica set (ISR): the replicas that are fully caught up with the leader. If the leader fails, a broker from the ISR is promoted, so no acknowledged data is lost.

This is the foundation of Kafka’s durability, and it connects directly to producer acks and min.insync.replicas, which you tune in Reliable Producing. Full treatment is in Cluster Anatomy.


4. How Kafka differs from a traditional queue

If you have used RabbitMQ or SQS, this table is the mental switch to make.

AspectTraditional queueKafka
On readMessage is removedEvent stays; the consumer advances an offset
ReplayNot built inAny consumer can re-read from an earlier offset
Fan-outUsually needs one queue per consumerEvery consumer group reads the same log independently
OrderingOften global per queuePer partition only, driven by the key
Scaling readsAdd consumers to one queueAdd partitions and consumers to one group
RetentionUntil consumedTime or size based, independent of consumption

The shift in one sentence: Kafka is a durable log you read from, not a mailbox you drain.


5. Delivery is at least once by default

This is the most important behavioral fact for your code. By default, Kafka and its consumers are configured so an event is processed at least once. A consumer can therefore see the same event more than once, for example if it processes an event but crashes before committing its offset.

sequenceDiagram
    participant K as orders partition 0
    participant C as Inventory consumer

    K->>C: deliver event at offset 7
    C->>C: reserve stock (succeeds)
    Note over C: crashes before committing offset 8
    K->>C: redeliver event at offset 7
    Note over C: same event processed twice

The implication is a design rule you will meet repeatedly: consumer handlers must be idempotent. Processing the same event twice must not reserve stock twice or charge a customer twice. Kafka does not solve this for you; it is a property of your application code. Exactly-once semantics narrow the window for the Kafka-to-Kafka case, but idempotency remains the durable strategy. Strategies are covered in Idempotent Consumers, Ordering, and Duplicates, and the guarantees themselves in Delivery Guarantees.


6. The core terms, mapped to Spring for Apache Kafka

This glossary is your quick reference. Each term links to the module where it is covered in depth.

ConceptWhat it meansSpring for Apache Kafka equivalent
ProducerCode that publishes events.KafkaTemplate
ConsumerCode that reads and processes events.@KafkaListener method
TopicA named, partitioned log of events. More detail in the Mental Model.referenced by name; created via NewTopic / KafkaAdmin
PartitionAn ordered slice of a topic; the unit of ordering and parallelism.container concurrency maps consumer threads to partitions
OffsetAn event’s position within a partition.committed via AckMode; auto.offset.reset sets where to start
KeyDetermines which partition an event lands in.key argument to kafkaTemplate.send(topic, key, value)
Consumer groupInstances sharing a topic’s partitions, with their own offsets.spring.kafka.consumer.group-id or @KafkaListener(groupId=...)
Replication factorNumber of copies of each partition.topic config (set on NewTopic or the broker)
In-sync replicas (ISR)Replicas caught up with the leader.pairs with producer acks=all and min.insync.replicas
Serializer / DeserializerTurns objects into bytes and back.JsonSerializer / JsonDeserializer, ErrorHandlingDeserializer
BrokerA single Kafka server storing partitions.spring.kafka.bootstrap-servers
Dead letter topic (DLT)Where un-processable records are routed.@RetryableTopic / DeadLetterPublishingRecoverer

7. A first look at the configuration

You have not written code yet, but here is the smallest Spring Boot configuration so the vocabulary maps to something concrete. You build on this in First Producer and Consumer.

spring:
  kafka:
    bootstrap-servers: localhost:9092   # broker address
    consumer:
      group-id: payment-service         # this app's consumer group
      auto-offset-reset: earliest        # where to start with no committed offset

Every key here is a core concept: the broker address, the consumer group that owns offsets, and where to begin reading. Nothing new, just the glossary in YAML form.


Checkpoint

You should now be able to:

  • Define broker, topic, partition, offset, producer, consumer, and consumer group.
  • Explain replication factor, leader vs follower, and the ISR in one or two sentences each.
  • Explain the core difference between a Kafka log and a traditional queue.
  • Explain why Kafka is at least once by default and why that forces idempotent consumers.
  • Map producer, consumer, topic, offset, and consumer group to their Spring for Apache Kafka equivalents.

Next: Section 1, Cluster Anatomy, where brokers, partitions, and replicas become a working architecture. (Coming next in the build.)