Read time: ~

Performance Tuning and Throughput

{"A holistic tuning guide" => "prefetch and concurrency, connection and channel sizing, persistence trade-offs, batching, and publisher throughput, driven by measurement."}

Prerequisite:Consumer Acknowledgements and Prefetch and ObservabilityYou’ll need: A workload you can measure and the metrics pipeline from the previous module.

Performance work is not a list of magic settings; it is a loop of measure, find the bottleneck, change one thing, and measure again. This module pulls the levers together into a decision guide. It references the prefetch and concurrency mechanics from module 7 rather than repeating them.


What you’ll be able to do after this module

  • Find the real bottleneck before changing settings.
  • Balance prefetch and consumer concurrency for your workload.
  • Size connections and channels without leaking them.
  • Weigh persistence and durability against speed.
  • Use batching and the right converter to raise publisher throughput.

1. Measure first, then tune

Every change below is a trade-off, so never guess. Start from the metrics in Observability and let the numbers point to the bottleneck.

flowchart TD
    Start["Throughput too low or latency too high"]
    Q{"Queue depth rising?"}
    Slow{"Handler time high?"}
    Pub{"Publisher confirms slow?"}
    Start --> Q
    Q -->|Yes| Slow
    Q -->|No| Pub
    Slow -->|Yes| Fix1["Optimize handler / add concurrency"]
    Slow -->|No| Fix2["Raise prefetch / consumer count"]
    Pub -->|Yes| Fix3["Batch, async confirms, more channels"]
    Pub -->|No| Fix4["Bottleneck is downstream (DB, network)"]

A rising queue with idle consumers is a prefetch problem; a rising queue with busy consumers is a handler problem. These need opposite fixes, which is why you measure before touching config.


2. Consumer throughput: prefetch and concurrency

The two biggest consumer levers are prefetch (how many unacked messages a consumer holds) and concurrency (how many consumer threads run). The mechanics live in Consumer Acknowledgements and Prefetch; here is how to tune them.

spring:
  rabbitmq:
    listener:
      simple:
        prefetch: 20          # start modest, raise while latency is stable
        concurrency: 4        # threads per listener
        max-concurrency: 16   # scale up under load
  • Prefetch too low: consumers idle between acks waiting for the next message, wasting the network round trip.
  • Prefetch too high: one consumer hoards the backlog, hurting fair distribution and memory. A value in the low tens per consumer is a common sweet spot.
  • Concurrency: raise it for I/O-bound handlers (database, HTTP calls). For CPU-bound work, more threads than cores just adds contention.

Tune prefetch and concurrency together, because raising one shifts the ideal value of the other.


3. Connections and channels

A connection is a TCP socket and is expensive; a channel is a lightweight virtual stream multiplexed over it. The rule: share one connection, use many channels, and never share a channel across threads.

flowchart LR
    App["Spring app"]
    subgraph conn ["1 CachingConnectionFactory connection"]
        ch1["channel (publish)"]
        ch2["channel (consumer 1)"]
        ch3["channel (consumer 2)"]
    end
    App --> conn
    conn --> Broker["RabbitMQ"]

CachingConnectionFactory pools channels for you. Size the cache to your concurrency so publishers are not blocked waiting for a free channel:

spring:
  rabbitmq:
    cache:
      channel:
        size: 25            # >= peak concurrent publishers/consumers
    publisher-confirm-type: correlated

Watch the connection and channel gauges from module 18. A steadily climbing count means a leak, usually a channel opened per request instead of reused.


4. Persistence and durability trade-offs

Durability is the classic speed-versus-safety dial. Persistent messages on durable queues survive a broker restart but pay a disk write on every message.

ChoiceFasterSafer
Transient message on durable queueyeslost on restart
Persistent message, durable quorum queuenosurvives restart and node loss
Publisher confirms onslightly slowerknow the broker got it

Do not blanket-disable persistence to chase numbers. Instead, match durability to the message: a payment event must be persistent, while a live dashboard tick can be transient. Quorum queues (the course default) add replication cost for their safety, which is usually the right trade for business events.


5. Batching and the publisher path

If the bottleneck is the publisher, reduce per-message overhead.

  • Batch on the client.BatchingRabbitTemplate groups several messages into one broker frame, cutting round trips for high-volume, small messages.
  • Use asynchronous confirms. Waiting synchronously for each confirm serializes publishing. Correlated async confirms let you keep publishing while acknowledgements stream back (see Publisher Confirms).
  • Keep payloads lean. JSON is convenient, but large or deeply nested payloads cost serialization and bandwidth. Send identifiers and let consumers fetch details when the payload would be heavy.
  • Reuse the template and converter. Recreating a RabbitTemplate or converter per call throws away pooling.
BatchingStrategy strategy = new SimpleBatchingStrategy(100, 16_384, 200);
BatchingRabbitTemplate template =
        new BatchingRabbitTemplate(strategy, scheduler);

Batching trades a little latency (messages wait to fill a batch) for a large throughput gain, so use it for volume, not for latency-critical single messages.


6. A tuning checklist

Work top-down, re-measuring after each change:

  1. Confirm the bottleneck is RabbitMQ and not a downstream database or API.
  2. Fix slow handlers first; no broker setting beats a faster consumer.
  3. Tune prefetch, then concurrency, to keep consumers busy but fair.
  4. Size the channel cache to peak concurrency; hunt connection leaks.
  5. Match persistence and confirms to each message’s importance.
  6. Batch and go async on the publisher only if it is the proven limit.

Checkpoint

You should now be able to:

  • Use metrics to locate the real bottleneck before tuning.
  • Balance prefetch and concurrency for I/O vs CPU-bound handlers.
  • Share a connection, pool channels, and detect leaks.
  • Choose persistence per message instead of globally.
  • Apply batching and async confirms where they actually help.

Next:Testing RabbitMQ Apps.