Read time: ~

Cluster Anatomy: Brokers, Partitions, Replicas

Brokers and the controller, partition leaders and followers, the in-sync replica set, replication factor, the high watermark, min.insync.replicas, and leader failover.

Section 0 gave you the log mental model and the core vocabulary. This section opens the box and shows how Kafka actually stores and protects your data across machines. Start here with the cluster: how brokers cooperate, how a partition is replicated, and what happens when a broker fails.


What you’ll be able to do after this module

  • Explain what a broker and the controller each do in a cluster.
  • Describe the leader and follower roles for a partition, and why all reads and writes go through the leader.
  • Define the in-sync replica set (ISR) and the high watermark, and explain what each protects.
  • Explain how replication factor and min.insync.replicas together set your durability floor.
  • Explain what happens during a leader failover, and the risk of unclean leader election.

1. Brokers and the controller

A broker is a single Kafka server. A cluster is a set of brokers that together store all the topic partitions and serve producers and consumers. Each broker owns some partitions (as leader) and holds copies of others (as follower).

One broker also acts as the controller. The controller is a coordination role, not a separate machine: it manages cluster metadata, tracks which brokers are alive, assigns partition leaders, and reacts when a broker joins or fails. Exactly how the controller is elected and where metadata lives differs between the ZooKeeper and KRaft models, which is the subject of the next module, The Control Plane.

flowchart TD
    subgraph cluster [Kafka Cluster]
        b1["Broker 1<br/>(also controller)"]
        b2["Broker 2"]
        b3["Broker 3"]
    end
    Producer["Producer"] --> b1
    Producer --> b2
    Consumer["Consumer group"] --> b1
    Consumer --> b3

2. Leaders and followers

Recall from Core Concepts that every partition is replicated. Of its replicas, exactly one is the leader and the rest are followers.

  • All produces and consumes for a partition go through its leader.
  • Followers do one job: continuously fetch from the leader to keep an identical copy of the log.

Leadership is spread across brokers so the load is balanced. For a topic with several partitions, one broker might lead partition 0 while following partition 1, and vice versa.

flowchart TD
    subgraph b1 [Broker 1]
        p0L["orders p0<br/>LEADER"]
        p1F["orders p1<br/>follower"]
    end
    subgraph b2 [Broker 2]
        p1L["orders p1<br/>LEADER"]
        p2F["orders p2<br/>follower"]
    end
    subgraph b3 [Broker 3]
        p2L["orders p2<br/>LEADER"]
        p0F["orders p0<br/>follower"]
    end
    p0L -->|replicate| p0F
    p1L -->|replicate| p1F
    p2L -->|replicate| p2F

This diagram shows the orders topic with 3 partitions and replication factor 2. Each partition has its leader on one broker and its single follower on another, and leadership is spread so no single broker carries every leader.


3. The in-sync replica set (ISR)

Not every follower is guaranteed to be caught up at any instant. A slow or restarting follower can fall behind the leader. Kafka tracks the subset of replicas that are fully caught up: the in-sync replica set (ISR).

  • A replica is in the ISR if it has fetched all records up to the leader’s latest offset within a time bound (replica.lag.time.max.ms).
  • The leader is always in its own ISR.
  • If a follower falls behind, the leader removes it from the ISR until it catches up again.

The ISR matters because Kafka only promotes an in-sync replica to leader when the current leader fails. That is how it avoids losing acknowledged data: a replica that was caught up has all the acknowledged records.

flowchart LR
    leader["Leader<br/>offset 100"]
    f1["Follower A<br/>offset 100<br/>(in ISR)"]
    f2["Follower B<br/>offset 100<br/>(in ISR)"]
    f3["Follower C<br/>offset 62<br/>(lagging, out of ISR)"]
    leader --> f1
    leader --> f2
    leader --> f3

4. The high watermark

If followers can lag, which records is a consumer allowed to read? Only the ones that are safely replicated. The high watermark is the highest offset that has been replicated to all replicas in the ISR. Consumers can read up to, but not beyond, the high watermark.

flowchart LR
    subgraph log [Leader log for orders p0]
        direction LR
        r97["97"] --> r98["98"] --> r99["99"] --> r100["100"]
    end
    hw["High watermark = 99<br/>(replicated to all ISR)"]
    le["Log end offset = 100<br/>(written, not yet fully replicated)"]

Record 100 exists on the leader but has not yet been confirmed by every ISR member, so it sits above the high watermark and is invisible to consumers. Once the ISR catches up, the high watermark advances to 100. This is what prevents a consumer from reading a record that could still be lost if the leader failed right now.


5. Replication factor and min.insync.replicas

Two settings decide your durability floor, and they work as a pair.

  • Replication factor: how many copies of each partition exist, set per topic. Factor 3 means one leader and two followers. Higher factor tolerates more broker failures at the cost of more storage and network.
  • min.insync.replicas: the minimum number of in-sync replicas that must acknowledge a write before it is considered successful, when the producer uses acks=all.

These only deliver durability together with the producer’s acks setting. The combination that most production teams use:

SettingCommon production valueEffect
Replication factor3Three copies; survives losing 2 brokers for storage
min.insync.replicas2A write needs at least 2 in-sync copies to succeed
Producer acksallProducer waits for all in-sync replicas

With RF 3 and min.insync.replicas=2, you can lose one broker and still accept writes. If a second broker goes down and only one in-sync replica remains, the leader rejects new writes (NotEnoughReplicasException) rather than risk data that exists on only one machine.

You tune the producer side of this pairing in Reliable Producing: Idempotence, Acks, Retries.


6. Leader failover

When the broker holding a partition leader fails, the controller promotes an in-sync replica to be the new leader, and clients transparently redirect to it.

sequenceDiagram
    participant Ctrl as Controller
    participant B1 as Broker 1 (leader p0)
    participant B2 as Broker 2 (ISR follower p0)
    participant Prod as Producer

    Note over B1: Broker 1 crashes
    Ctrl->>Ctrl: detect Broker 1 is gone
    Ctrl->>B2: promote to leader for p0
    B2-->>Ctrl: now leader
    Ctrl-->>Prod: metadata update (p0 leader = Broker 2)
    Prod->>B2: produce to p0 (resumes)

Because the new leader came from the ISR, it already holds every acknowledged record up to the old high watermark, so no acknowledged data is lost. Producers and consumers briefly see a leadership change in their metadata and then continue against the new leader. The operational side of this event, and how it looks in production, is covered in Broker Down, Controller Failover, KRaft Quorum Loss.


7. Unclean leader election

There is a dangerous edge case. What if the leader fails and there are no in-sync replicas left to promote, only a stale follower that fell behind?

  • With unclean.leader.election.enable=false (the safe default), Kafka refuses to elect the stale replica. The partition goes offline until an in-sync replica returns. You choose consistency over availability.
  • With unclean.leader.election.enable=true, Kafka promotes the stale replica to restore availability, but any records the old leader had beyond the stale replica’s log are lost. You choose availability over consistency.
flowchart TD
    fail["Leader fails, no ISR follower available"]
    q{"unclean.leader.election.enable?"}
    safe["false (default):<br/>partition offline,<br/>wait for in-sync replica.<br/>No data loss."]
    risky["true:<br/>promote stale replica.<br/>Available, but data loss."]
    fail --> q
    q -->|false| safe
    q -->|true| risky

For the Order and Payment services, where losing an acknowledged payment event is unacceptable, keep the safe default. Prefer a brief partition outage over silent data loss.


Checkpoint

You should now be able to:

  • Explain what a broker does and what extra role the controller plays.
  • Describe why all reads and writes for a partition go through its leader, and what followers do.
  • Define the ISR and the high watermark, and explain what each one protects against.
  • Explain how replication factor, min.insync.replicas, and producer acks combine to set a durability floor.
  • Describe a leader failover and explain why unclean leader election trades consistency for availability.

Next:The Control Plane: KRaft and ZooKeeper, where you see how the controller is elected and where cluster metadata actually lives.