Offset Problems
Diagnose OffsetOutOfRangeException, unwanted auto.offset.reset replays, __consumer_offsets issues, and how to reset offsets safely.
1. Symptom
A consumer suddenly reprocesses a huge amount of old data (a replay), or skips ahead and misses records, or logs OffsetOutOfRangeException. Often this surfaces after a deploy, a long downtime, or an offset reset that went wrong. The blast radius can be large: a full-topic replay can double-charge customers or flood downstream systems.
The goal is to understand what the committed offset is doing relative to the log, and to reset offsets safely when you must, using the offset mechanics from Consuming Deeper.
2. Likely causes
| Cause | How it manifests |
|---|---|
auto.offset.reset=earliest with no committed offset | A new group (or expired offsets) replays the whole topic |
auto.offset.reset=latest with no committed offset | A new group skips everything produced before it started |
| Committed offset older than retention | OffsetOutOfRangeException: the offset points at deleted data |
| A bad manual offset reset | Offsets moved to the wrong place, causing replay or skip |
| Group id changed accidentally | A new group id means no committed offsets, triggering the reset policy |
3. How it manifests to the Spring app
| Cause | What the service sees |
|---|---|
| Earliest with no offset | Listener floods with old events on first start |
| Latest with no offset | Listener silently misses records produced during downtime |
| Offset out of range | OffsetOutOfRangeException, then the reset policy kicks in |
| Wrong manual reset | Sudden replay or gap right after an operational change |
4. Diagnostic steps
- Describe the group with
kafka-consumer-groups --describe. Compare current offset, log end offset, and lag. A current offset far below the end after a change signals a replay. - Check
auto.offset.resetfor the consumer. This determines behavior only when there is no valid committed offset, but that is exactly when incidents happen. - Check whether the group id changed. An accidental new group id explains a full replay or skip.
- Check retention vs offset age. If the group was down longer than retention, its offsets point at deleted data, causing
OffsetOutOfRangeException. - Reconstruct recent operational changes. Deploys, group renames, and manual resets are the usual triggers.
| Step | Question it answers | Time cost |
|---|---|---|
| 1. Describe group | Where is the offset vs the log? | 1 min |
| 2. Reset policy | What happens with no offset? | 1 min |
| 3. Group id | Did the group identity change? | 1-2 min |
| 4. Retention vs age | Are offsets pointing at deleted data? | 2-3 min |
| 5. Change history | What operational change triggered it? | 2-3 min |
5. Safe remediations
| Situation | Safe action |
|---|---|
| Need to reset offsets | Always --dry-run first, with the group stopped, then --execute to a specific target (--to-datetime, --to-offset) |
| Accidental group rename | Restore the correct group id so committed offsets are used again |
| Offset out of range | Decide intent: --to-earliest (reprocess) or --to-latest (skip gap), with owner sign-off |
| Unwanted replay in progress | Stop the consumer, reset to the correct offset, then restart |
6. Escalation trigger
Page on-call engineering if:
- A full-topic replay is already in progress and downstream effects (charges, notifications) are firing.
- The correct offset target is unclear and a wrong reset would cause data loss or duplication.
OffsetOutOfRangeExceptionstems from retention that other teams control.__consumer_offsetsitself appears unhealthy (a cluster-level issue).
7. Relevant commands and exhibits
# Offset out of range: committed offset points at deleted data
org.apache.kafka.clients.consumer.OffsetOutOfRangeException:
Fetch position FetchPosition{offset=1200} is out of range for partition orders-0
# ...auto.offset.reset then decides earliest/latest
# ALWAYS preview first (group must be stopped)
kafka-consumer-groups.sh --bootstrap-server $BROKER --group payment-service \
--topic orders --reset-offsets --to-datetime 2026-07-05T00:00:00.000 --dry-run
# Apply only after confirming the preview
kafka-consumer-groups.sh --bootstrap-server $BROKER --group payment-service \
--topic orders --reset-offsets --to-datetime 2026-07-05T00:00:00.000 --execute
# The setting that governs no-committed-offset behavior
spring:
kafka:
consumer:
auto-offset-reset: latest # or earliest; choose deliberately
group-id: payment-service # keep stable; a rename resets everything
8. Guided practical
Reproduce offset behavior in the local lab.
- Produce records to
orders, then start a consumer with a brand-new group id andauto-offset-reset: earliest: it replays everything. - Repeat with a different new group id and
latest: it skips the existing records. - Stop the group and run a
--reset-offsets --to-earliest --dry-run, read the preview, then--executeand watch the replay. - Explain why the group must be stopped for the reset to apply.