Performance Tuning and Throughput
Partition count trade-offs, producer batching and compression, consumer fetch and poll settings, durability trade-offs, and how to reason about throughput versus latency.
Kafka is fast by default, but a production workload usually needs deliberate tuning to hit its throughput or latency target without sacrificing durability. This module is a holistic guide: it ties together the producer and consumer settings you already met and shows how to reason about the trade-offs rather than cargo-culting values.
What you’ll be able to do after this module
- Choose a partition count and understand its trade-offs.
- Tune producer batching and compression for throughput.
- Tune consumer fetch and poll settings.
- Reason about the durability versus performance trade-off.
- Balance throughput against latency for a workload.
1. The core trade-off: throughput vs latency
Almost every Kafka tuning knob trades throughput against latency. Waiting to fill a batch raises throughput but adds latency. Requiring more replicas to acknowledge raises durability but adds latency. There is no universally fast configuration, only one matched to your goal.
So the first step is to state the goal. High throughput for a bulk pipeline and low latency for an interactive flow lead to opposite choices on the same settings.
flowchart TD
goal{"What matters most?"}
goal -->|throughput| tp["larger batches, more linger,<br/>compression, bigger fetches"]
goal -->|latency| lat["small linger, small fetch wait,<br/>fewer in-flight waits"]
goal -->|durability| dur["acks=all, min.insync.replicas,<br/>accept some latency"]
2. Partition count
Partitions are the unit of parallelism, so partition count sets the throughput ceiling for a consumer group, as established in Consuming Deeper. More partitions allow more parallel consumers, but they are not free.
| More partitions | Effect |
|---|---|
| Pro | Higher consumer parallelism and total throughput |
| Con | More open file handles and memory on brokers |
| Con | Longer leader election and recovery times |
| Con | More end-to-end latency for some patterns |
A practical approach: estimate target throughput, divide by the per-partition throughput you can sustain, and add headroom. Remember from Idempotent Consumers, Ordering, and Duplicates that raising partition count later disrupts key-to-partition mapping, so size with growth in mind.
3. Producer tuning
The producer settings from Producing Deeper are the main throughput levers. For a throughput-oriented producer:
spring:
kafka:
producer:
batch-size: 65536 # larger batches (64 KB)
linger-ms: 20 # wait longer to fill them
compression-type: lz4 # compress each batch
properties:
max.in.flight.requests.per.connection: 5
- Larger
batch.sizeand higherlinger.msmean bigger, fewer requests, which raises throughput at the cost of a little latency. - Compression shrinks network and disk use, and larger batches compress better, so batching and compression reinforce each other.
- For a latency-oriented producer, do the opposite: small
linger.msso records are sent promptly.
4. Consumer tuning
On the read side, the fetch settings control how the consumer trades round trips for latency.
| Setting | Effect |
|---|---|
fetch.min.bytes | Broker waits until this much data is ready before responding; higher means fewer, larger fetches |
fetch.max.wait.ms | Cap on how long the broker waits for fetch.min.bytes |
max.poll.records | Max records returned per poll; bounds per-batch processing time |
spring:
kafka:
consumer:
fetch-min-size: 65536 # wait for 64 KB
fetch-max-wait: 100 # but no longer than 100 ms
max-poll-records: 500
Raising fetch.min.bytes improves throughput by batching fetches, at the cost of latency bounded by fetch.max.wait.ms. Keep max.poll.records low enough that a batch is processed well within max.poll.interval.ms, or you risk the rebalance storm from Rebalancing and Consumer Group Stability.
5. Durability vs performance
Durability settings have a performance cost, and this is where you must not blindly optimize. acks=all with min.insync.replicas=2, from Reliable Producing, adds latency because the leader waits for replicas, but it is what makes an acknowledged write safe.
The honest guidance: do not trade away durability for throughput on business-critical data. Tune batching, compression, and partitioning first, which raise throughput without weakening guarantees. Only relax acks for data where loss is genuinely acceptable, such as high-volume metrics.
6. A tuning method
Tune with measurement, not guesswork, using the signals from Observability.
- State the goal: throughput, latency, or durability first.
- Measure the baseline: throughput, end-to-end latency, and consumer lag.
- Change one thing at a time (for example
linger.ms), then re-measure. - Watch for the trade-off you accepted (for example latency rising as throughput improves).
- Stop when the goal is met; do not over-tune.
flowchart TD
s1["state goal"] --> s2["measure baseline"]
s2 --> s3["change one setting"]
s3 --> s4["re-measure"]
s4 --> s5{"goal met?"}
s5 -->|no| s3
s5 -->|yes| done["stop"]
7. Guided practical
Run this against the local lab.
- Measure baseline produce throughput with default settings using a simple loop or
kafka-producer-perf-test.sh. - Raise
batch.sizeandlinger.ms, addlz4compression, and re-measure throughput. - Raise
fetch.min.bytesand observe consumer throughput and latency change. - Lower
max.poll.recordsand confirm per-batch processing time drops. - Compare
acks=1vsacks=allthroughput, and note the durability you would give up.
Next:Testing Kafka Applications, the last production-readiness module, where you test producers, consumers, and streams reliably.