RabbitMQ Tooling: Management UI, rabbitmqctl & Diagnostics

Prerequisite:AWS ArchitectureYou’ll need: the Docker container from Environment Setup running, terminal access

What you’ll be able to do after this module

Navigate the Management UI to check node health, queue depth, connections, and consumers.
Run the core rabbitmqctl / rabbitmq-diagnostics commands and tell healthy output from alerting output.
Know which CloudWatch metric to check for a given symptom, and cross-reference it with Spring Boot Actuator.

1. The Management UI tour

Open localhost:15672 (from the Environment Setup container) and log in (guest/guest).

Tab	What it shows	What to look for when triaging
Overview	Cluster-wide message rates, node list, alarms	Any node not `green`/`running`; any active resource alarm (memory/disk) banner at the top
Connections	Every open TCP connection, from which app, since when	Sudden spike in connection count (churn); connections stuck in a weird state
Channels	Every open channel, its consumer count, unacked messages, prefetch	Channels with a huge “unacked” count (consumer stuck or crashed mid-processing)
Exchanges	All exchanges, message-in/out rates	Confirms whether a producer is actually publishing (rate > 0)
Queues	Every queue: ready/unacked/total messages, consumer count, message rates	This is the #1 tab you’ll live in. Ready count climbing = backlog. Consumers = 0 = nobody’s listening.
Admin	Users, vhosts, policies	Rarely touched by support tier: usually read-only access here

Click into a specific queue (e.g., orders.created.queue from First Producer and Consumer) and note the fields:

Ready: messages waiting to be delivered to a consumer.
Unacked: messages delivered to a consumer but not yet acknowledged (i.e., currently “in flight” / being processed).
Total: Ready + Unacked.
Consumers: how many active consumer connections are attached to this queue right now.
Message rates: publish/deliver/ack rates over time, graphed.

Healthy pattern: Ready hovers near 0, Unacked briefly spikes then drops, consumer count matches your expected deployed instance count. Alerting pattern: Ready climbs steadily and doesn’t recover, or Consumers = 0 while Ready > 0.

2. CLI tools: `rabbitmqctl` and `rabbitmq-diagnostics`

Exec into the running container to try these (in production you’d use SSM Session Manager instead of docker exec, but the commands themselves are identical):

docker exec -it rabbitmq-crashcourse bash

Cluster health

rabbitmq-diagnostics status

Healthy: shows Status of node rabbit@<hostname> ... with no errors, lists enabled plugins, memory/disk watermarks not exceeded.

rabbitmq-diagnostics cluster_status

Healthy: lists all expected nodes under Running Nodes, with none under Nodes Not Running. Alerting: a node appears missing from Running Nodes, this is your first signal for Playbook 03, Node Down.

rabbitmq-diagnostics check_running
rabbitmq-diagnostics check_local_alarms

Healthy:check_local_alarms returns success with no output. Alerting: returns a resource_limit_alarm for memory or disk, this node has hit a watermark and is now blocking publishers. Go straight to Playbook 02.

Queues

rabbitmqctl list_queues name messages_ready messages_unacknowledged consumers

Healthy example output:

name                    messages_ready  messages_unacknowledged  consumers
orders.created.queue    0               0                        2

Alerting example output:

name                    messages_ready  messages_unacknowledged  consumers
orders.created.queue    48213           0                        0

consumers = 0 with a large and growing messages_ready is the single most common alert pattern you’ll triage. It means: messages are arriving, nothing is picking them up.

Connections and channels

rabbitmqctl list_connections name peer_host state
rabbitmqctl list_channels connection_details consumer_count messages_unacknowledged

Use these to identify which application instance owns a problematic connection/channel, critical when escalating to an app team, since you can tell them exactly which pod/instance to look at instead of “something’s wrong with your service.”

Users and permissions (read-only checks during an auth incident)

rabbitmqctl list_users
rabbitmqctl list_permissions -p /

Useful for confirming “does this user actually have publish/consume rights on this vhost” before assuming it’s a network problem.

⚠️ CAUTION: Commands like rabbitmqctl delete_queue, purge_queue, forget_cluster_node, or reset are destructive and can cause data loss or cluster damage. None of the commands above modify anything, they are all safe, read-only diagnostics. Anything that changes broker state requires the approval/escalation path in Escalation and Communication.

3. CloudWatch metrics reference

Metric	Meaning	Healthy range (example)	Alert threshold (example)
`QueueDepth` / `MessageReadyCount`	Messages waiting for a consumer	Near 0, or draining quickly	Sustained growth over N minutes
`ConsumerCount`	Active consumers attached to a queue	Matches expected deployed instance count	0 while messages are arriving
`NodeMemoryUsage` (or `mem_used` via plugin)	Broker memory usage vs. configured high watermark	< 60% of watermark	> 90% of watermark (triggers publisher blocking)
`DiskFreeLimitAlarm`	Whether the disk-space alarm is active	Not triggered	Triggered (publishers blocked cluster-wide)
`FileDescriptorsUsed`	OS file descriptors in use by the broker process	Well below the ulimit	Approaching the ulimit (connection/channel exhaustion)
`ConnectionCount`	Total open AMQP connections	Stable, matches expected app instance count × pool size	Rapid, continuous growth (churn/leak)
`PublishRate` / `DeliverRate` / `AckRate`	Messages/sec published, delivered, acknowledged	Deliver ≈ Publish over time; Ack ≈ Deliver	Ack rate persistently lower than Deliver rate (processing is failing or hanging)
`EBSVolumeQueueLength` / `EBSReadWriteOps` (AWS-layer)	Disk I/O saturation on the underlying EBS volume	Below provisioned IOPS/throughput limit	Sustained saturation → broker-level latency even though CPU looks fine
`CPUCreditBalance` (if burstable instance type)	Remaining CPU burst credits	Stable or replenishing	Steadily depleting toward 0 → imminent throttling

4. Cross-referencing with the Spring Boot side

Infra metrics only tell half the story. Always correlate with the application side:

Spring Boot Actuator health: if you have spring-boot-starter-actuator + spring-boot-starter-amqp, hit /actuator/health, a healthy RabbitMQ connection shows:
```
{ "components": { "rabbit": { "status": "UP", "details": { "version": "3.13.x" } } } }
```
A broken connection shows "status": "DOWN" with an exception message, this is often faster to check than digging through broker logs.
Application logs to grep for: | Log signature | Usually means | |—|—| | AmqpConnectException / Connection refused | Broker unreachable: network/SG issue or broker down | | PossibleAuthenticationFailureException | Credentials wrong: check Playbook 05 | | ListenerExecutionFailedException | Your @RabbitListener method threw an exception: this is an app-code bug, not a broker problem | | SSLHandshakeException | Certificate issue: check Playbook 08 | | Consumer thread silent, no errors, but queue growing | Listener likely blocked/hung (e.g., waiting on a slow downstream DB call): check thread dumps, not broker logs |

Rule of thumb: if the Management UI shows healthy broker-side metrics (low ready count, consumers attached, no alarms) but the business symptom persists (e.g., orders not shipping), the problem is almost certainly in application code, not RabbitMQ. If the Management UI itself shows the anomaly (growing ready count, 0 consumers, active alarms), start with the broker/infra side.

Practical: diagnose a live backlog using only the CLI

Step 1: Using your producer/consumer app from First Producer and Consumer, stop the consumer (comment out @Component again, or just stop the Spring Boot app).

Step 2: Publish 20 messages in a loop:

for i in $(seq 1 20); do
  curl -s -X POST localhost:8080/orders -H "Content-Type: application/json" -d "{\"id\":$i}"
done

Step 3: Without opening the Management UI, use only rabbitmqctl (via docker exec) to answer:

How many messages are ready in orders.created.queue?
How many consumers are attached?
Based on that alone, what’s your diagnosis?

Step 4: Restart the consumer app and re-run the same list_queues command to confirm the backlog drains and messages_ready returns to 0.

✅ Checkpoint

You should now be able to:

Name the four Management UI tabs you’d check first during an incident, in priority order.
Run rabbitmqctl list_queues and rabbitmq-diagnostics check_local_alarms from memory.
Explain the rule of thumb for deciding “broker problem” vs. “app problem” based on what the Management UI shows.

Next:Alert Playbooks