Observability: Metrics, Tracing and Health
Micrometer and Actuator metrics, the broker and consumer signals that matter, Prometheus and Grafana, distributed tracing, and health checks.
Prerequisite:Consumer Acknowledgements and PrefetchYou’ll need: The Spring AMQP services from earlier modules and, optionally, Prometheus and Grafana.
You cannot tune or troubleshoot what you cannot see. Observability answers three questions in production: is the system healthy right now, is it keeping up with load, and where did a slow or failed request actually spend its time. This module wires metrics, tracing, and health into the Spring services you have already built.
What you’ll be able to do after this module
- Expose RabbitMQ and application metrics through Actuator and Micrometer.
- Identify the broker and consumer signals worth alerting on.
- Scrape metrics into Prometheus and chart them in Grafana.
- Propagate a trace across a publish and consume hop.
- Add a broker health check to your readiness probe.
1. The three pillars for a messaging system
Metrics, traces, and health each answer a different question. Together they give you a full picture.
flowchart LR
App["Spring Boot service"]
App -->|"Micrometer"| Prom["Prometheus"]
Prom --> Graf["Grafana dashboards & alerts"]
App -->|"OpenTelemetry"| Trace["Tracing backend (Tempo / Zipkin)"]
App -->|"Actuator /health"| Probe["Orchestrator readiness probe"]
Metrics tell you what is happening at scale, traces tell you why a specific request was slow, and health tells an orchestrator whether to send traffic at all.
2. Metrics with Micrometer and Actuator
Spring Boot auto-instruments Spring AMQP. Add Actuator and a registry, and RabbitTemplate and @RabbitListener containers publish timers and counters with no extra code.
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
management:
endpoints:
web:
exposure:
include: health,info,prometheus,metrics
metrics:
tags:
application: order-service
This exposes rabbitmq.published, spring.rabbitmq.listener timers, and connection and channel gauges at /actuator/prometheus. The application tag lets Grafana separate Order, Inventory, and Notification services.
3. The signals that matter
Do not chart everything. These few signals catch most incidents.
| Signal | Where | Why it matters |
|---|---|---|
Queue depth (messages_ready) | Broker | Rising depth means consumers are falling behind |
| Consumer ack rate vs publish rate | Broker | Publish outpacing ack is a leading lag indicator |
| Unacked messages | Broker | Stuck or slow consumers holding prefetch |
| Listener processing time | App (Micrometer) | Slow handlers, the root cause of lag |
| Redelivery / DLQ rate | Broker | Poison messages or failing handlers |
| Connection and channel count | Both | Leaks from unclosed resources |
Broker-level numbers come from the rabbitmq-prometheus plugin (enable it with rabbitmq-plugins enable rabbitmq_prometheus), while handler timing comes from your app. You need both, because a healthy broker can still hide a slow consumer.
4. Prometheus and Grafana
Prometheus scrapes each service and the broker on a fixed interval; Grafana queries Prometheus for dashboards and alerts.
flowchart LR
OS["order-service :8080/actuator/prometheus"]
IS["inventory-service"]
RB["rabbitmq :15692/metrics"]
Prom["Prometheus"]
Graf["Grafana"]
OS --> Prom
IS --> Prom
RB --> Prom
Prom --> Graf
# prometheus.yml
scrape_configs:
- job_name: 'order-service'
metrics_path: /actuator/prometheus
static_configs:
- targets: ['order-service:8080']
- job_name: 'rabbitmq'
static_configs:
- targets: ['rabbitmq:15692']
A good starting alert: warn when a queue’s messages_ready stays above a threshold for several minutes, which is the metric version of the Queue Depth and Consumer Lag playbook.
5. Distributed tracing
A message breaks the call stack: the publisher returns immediately and the consumer runs later on another thread or host. Tracing stitches these back together by carrying a trace context in the message headers.
sequenceDiagram
participant O as Order Service
participant B as RabbitMQ
participant I as Inventory Service
O->>O: start span "POST /orders" (traceId=abc)
O->>B: publish OrderCreated (headers carry traceId=abc)
B->>I: deliver
I->>I: continue span (same traceId=abc)
Note over O,I: One trace spans both services
With Micrometer Tracing plus an OpenTelemetry bridge, Spring propagates the context automatically across the publish and consume boundary.
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-otlp</artifactId>
</dependency>
Now a single traceId links the HTTP request, the publish, and the consumer’s processing, so you can see exactly where an order slowed down.
6. Health checks
Actuator ships a RabbitHealthIndicator that checks broker connectivity. It is included automatically and appears in /actuator/health.
{
"status": "UP",
"components": {
"rabbit": { "status": "UP", "details": { "version": "3.13.0" } }
}
}
Split liveness from readiness so a broker blip does not kill your pod:
management:
endpoint:
health:
probes:
enabled: true
group:
readiness:
include: rabbit,db
Map the orchestrator’s readiness probe to /actuator/health/readiness. When the broker is unreachable the pod is pulled from rotation but not restarted, and it recovers automatically per Connection Recovery.
Checkpoint
You should now be able to:
- Expose Prometheus metrics from a Spring AMQP service.
- Name the handful of signals worth alerting on.
- Scrape services and the broker into Prometheus and Grafana.
- Explain how a trace propagates across a publish and consume hop.
- Wire the broker health check into a readiness probe.