Alert Playbooks
Nine independent playbooks for recurring RabbitMQ incidents, symptom, diagnosis, remediation, escalation.
Prerequisite:Tooling Walkthrough
This module is split into 9 short, independent playbooks, one per recurring incident type. Each is designed to be read and practiced in 15-20 minutes, so you can work through them one at a time (e.g., one per day) rather than in a single long session.
Do them in order the first time through, later playbooks occasionally reference concepts from earlier ones (e.g., quorum queues from Playbook 03 come up again in Playbook 09). After that, use them as standalone reference material during real incidents.
Common structure
Every playbook follows the same shape, so once you know it you can navigate any of them under time pressure:
- Symptom: what the alert/ticket actually says.
- Likely Causes: broker-side causes and application-side (Spring Boot) causes, listed separately.
- Diagnostic Steps: an ordered checklist, cheapest/fastest checks first.
- Safe Remediations: what you can do yourself, with ⚠️ CAUTION callouts on anything risky.
- Escalation Trigger: the specific condition that means “stop, page on-call engineering.”
- Relevant Commands/Queries: copy-pasteable commands referenced in the steps above.
- Mini practical: a small, safe exercise to reproduce a scaled-down version of the incident locally.
Playbooks
| # | Playbook | Core skill it builds |
|---|---|---|
| 1 | Queue Depth Growing / Consumer Lag | Reading queue metrics; spotting stuck/absent consumers |
| 2 | Memory/Disk Alarm & Blocked Publishers | Understanding broker self-protection mechanisms |
| 3 | Node Down / Cluster Partition | Quorum queue behavior under node failure |
| 4 | Connection/Channel Exhaustion | Spotting connection/channel leaks in app code |
| 5 | Auth Failures After Credential Rotation | Secrets lifecycle vs. long-lived app connections |
| 6 | Poison Messages & DLQ | Retry/dead-lettering interplay with Spring Retry |
| 7 | AWS-Layer Connectivity Issues | Distinguishing network/infra symptoms from broker symptoms |
| 8 | TLS/Certificate Expiry | Reading SSL handshake failures end-to-end |
| 9 | Latency Spikes & Ordering/Duplicate Surprises | GC/CPU-credit correlation; idempotent consumer design |