Kafka Performance Tuning: Throughput, Latency, and Partitions

Kafka is fast by default, but a production workload usually needs deliberate tuning to hit its throughput or latency target without sacrificing durability. This module is a holistic guide: it ties together the producer and consumer settings you already met and shows how to reason about the trade-offs rather than cargo-culting values.

What you’ll be able to do after this module

Choose a partition count and understand its trade-offs.
Tune producer batching and compression for throughput.
Tune consumer fetch and poll settings.
Reason about the durability versus performance trade-off.
Balance throughput against latency for a workload.

1. The core trade-off: throughput vs latency

Almost every Kafka tuning knob trades throughput against latency. Waiting to fill a batch raises throughput but adds latency. Requiring more replicas to acknowledge raises durability but adds latency. There is no universally fast configuration, only one matched to your goal.

So the first step is to state the goal. High throughput for a bulk pipeline and low latency for an interactive flow lead to opposite choices on the same settings.

flowchart TD
    goal{"What matters most?"}
    goal -->|throughput| tp["larger batches, more linger,<br/>compression, bigger fetches"]
    goal -->|latency| lat["small linger, small fetch wait,<br/>fewer in-flight waits"]
    goal -->|durability| dur["acks=all, min.insync.replicas,<br/>accept some latency"]

2. Partition count

Partitions are the unit of parallelism, so partition count sets the throughput ceiling for a consumer group, as established in Consuming Deeper. More partitions allow more parallel consumers, but they are not free.

More partitions	Effect
Pro	Higher consumer parallelism and total throughput
Con	More open file handles and memory on brokers
Con	Longer leader election and recovery times
Con	More end-to-end latency for some patterns

A practical approach: estimate target throughput, divide by the per-partition throughput you can sustain, and add headroom. Remember from Idempotent Consumers, Ordering, and Duplicates that raising partition count later disrupts key-to-partition mapping, so size with growth in mind.

3. Producer tuning

The producer settings from Producing Deeper are the main throughput levers. For a throughput-oriented producer:

spring:
  kafka:
    producer:
      batch-size: 65536        # larger batches (64 KB)
      linger-ms: 20            # wait longer to fill them
      compression-type: lz4    # compress each batch
      properties:
        max.in.flight.requests.per.connection: 5

Larger batch.size and higher linger.ms mean bigger, fewer requests, which raises throughput at the cost of a little latency.
Compression shrinks network and disk use, and larger batches compress better, so batching and compression reinforce each other.
For a latency-oriented producer, do the opposite: small linger.ms so records are sent promptly.

4. Consumer tuning

On the read side, the fetch settings control how the consumer trades round trips for latency.

Setting	Effect
`fetch.min.bytes`	Broker waits until this much data is ready before responding; higher means fewer, larger fetches
`fetch.max.wait.ms`	Cap on how long the broker waits for `fetch.min.bytes`
`max.poll.records`	Max records returned per poll; bounds per-batch processing time

spring:
  kafka:
    consumer:
      fetch-min-size: 65536      # wait for 64 KB
      fetch-max-wait: 100        # but no longer than 100 ms
      max-poll-records: 500

Raising fetch.min.bytes improves throughput by batching fetches, at the cost of latency bounded by fetch.max.wait.ms. Keep max.poll.records low enough that a batch is processed well within max.poll.interval.ms, or you risk the rebalance storm from Rebalancing and Consumer Group Stability.

5. Durability vs performance

Durability settings have a performance cost, and this is where you must not blindly optimize. acks=all with min.insync.replicas=2, from Reliable Producing, adds latency because the leader waits for replicas, but it is what makes an acknowledged write safe.

The honest guidance: do not trade away durability for throughput on business-critical data. Tune batching, compression, and partitioning first, which raise throughput without weakening guarantees. Only relax acks for data where loss is genuinely acceptable, such as high-volume metrics.

6. A tuning method

Tune with measurement, not guesswork, using the signals from Observability.

State the goal: throughput, latency, or durability first.
Measure the baseline: throughput, end-to-end latency, and consumer lag.
Change one thing at a time (for example linger.ms), then re-measure.
Watch for the trade-off you accepted (for example latency rising as throughput improves).
Stop when the goal is met; do not over-tune.

flowchart TD
    s1["state goal"] --> s2["measure baseline"]
    s2 --> s3["change one setting"]
    s3 --> s4["re-measure"]
    s4 --> s5{"goal met?"}
    s5 -->|no| s3
    s5 -->|yes| done["stop"]

7. Guided practical

Run this against the local lab.

Measure baseline produce throughput with default settings using a simple loop or kafka-producer-perf-test.sh.
Raise batch.size and linger.ms, add lz4 compression, and re-measure throughput.
Raise fetch.min.bytes and observe consumer throughput and latency change.
Lower max.poll.records and confirm per-batch processing time drops.
Compare acks=1 vs acks=all throughput, and note the durability you would give up.

Next:Testing Kafka Applications, the last production-readiness module, where you test producers, consumers, and streams reliably.