Read time: ~

Offset Problems

Diagnose OffsetOutOfRangeException, unwanted auto.offset.reset replays, __consumer_offsets issues, and how to reset offsets safely.


1. Symptom

A consumer suddenly reprocesses a huge amount of old data (a replay), or skips ahead and misses records, or logs OffsetOutOfRangeException. Often this surfaces after a deploy, a long downtime, or an offset reset that went wrong. The blast radius can be large: a full-topic replay can double-charge customers or flood downstream systems.

The goal is to understand what the committed offset is doing relative to the log, and to reset offsets safely when you must, using the offset mechanics from Consuming Deeper.


2. Likely causes

CauseHow it manifests
auto.offset.reset=earliest with no committed offsetA new group (or expired offsets) replays the whole topic
auto.offset.reset=latest with no committed offsetA new group skips everything produced before it started
Committed offset older than retentionOffsetOutOfRangeException: the offset points at deleted data
A bad manual offset resetOffsets moved to the wrong place, causing replay or skip
Group id changed accidentallyA new group id means no committed offsets, triggering the reset policy

3. How it manifests to the Spring app

CauseWhat the service sees
Earliest with no offsetListener floods with old events on first start
Latest with no offsetListener silently misses records produced during downtime
Offset out of rangeOffsetOutOfRangeException, then the reset policy kicks in
Wrong manual resetSudden replay or gap right after an operational change

4. Diagnostic steps

  1. Describe the group with kafka-consumer-groups --describe. Compare current offset, log end offset, and lag. A current offset far below the end after a change signals a replay.
  2. Check auto.offset.reset for the consumer. This determines behavior only when there is no valid committed offset, but that is exactly when incidents happen.
  3. Check whether the group id changed. An accidental new group id explains a full replay or skip.
  4. Check retention vs offset age. If the group was down longer than retention, its offsets point at deleted data, causing OffsetOutOfRangeException.
  5. Reconstruct recent operational changes. Deploys, group renames, and manual resets are the usual triggers.
StepQuestion it answersTime cost
1. Describe groupWhere is the offset vs the log?1 min
2. Reset policyWhat happens with no offset?1 min
3. Group idDid the group identity change?1-2 min
4. Retention vs ageAre offsets pointing at deleted data?2-3 min
5. Change historyWhat operational change triggered it?2-3 min

5. Safe remediations

SituationSafe action
Need to reset offsetsAlways --dry-run first, with the group stopped, then --execute to a specific target (--to-datetime, --to-offset)
Accidental group renameRestore the correct group id so committed offsets are used again
Offset out of rangeDecide intent: --to-earliest (reprocess) or --to-latest (skip gap), with owner sign-off
Unwanted replay in progressStop the consumer, reset to the correct offset, then restart

6. Escalation trigger

Page on-call engineering if:

  • A full-topic replay is already in progress and downstream effects (charges, notifications) are firing.
  • The correct offset target is unclear and a wrong reset would cause data loss or duplication.
  • OffsetOutOfRangeException stems from retention that other teams control.
  • __consumer_offsets itself appears unhealthy (a cluster-level issue).

7. Relevant commands and exhibits

# Offset out of range: committed offset points at deleted data
org.apache.kafka.clients.consumer.OffsetOutOfRangeException:
  Fetch position FetchPosition{offset=1200} is out of range for partition orders-0
# ...auto.offset.reset then decides earliest/latest
# ALWAYS preview first (group must be stopped)
kafka-consumer-groups.sh --bootstrap-server $BROKER --group payment-service \
  --topic orders --reset-offsets --to-datetime 2026-07-05T00:00:00.000 --dry-run

# Apply only after confirming the preview
kafka-consumer-groups.sh --bootstrap-server $BROKER --group payment-service \
  --topic orders --reset-offsets --to-datetime 2026-07-05T00:00:00.000 --execute
# The setting that governs no-committed-offset behavior
spring:
  kafka:
    consumer:
      auto-offset-reset: latest   # or earliest; choose deliberately
      group-id: payment-service   # keep stable; a rename resets everything

8. Guided practical

Reproduce offset behavior in the local lab.

  1. Produce records to orders, then start a consumer with a brand-new group id and auto-offset-reset: earliest: it replays everything.
  2. Repeat with a different new group id and latest: it skips the existing records.
  3. Stop the group and run a --reset-offsets --to-earliest --dry-run, read the preview, then --execute and watch the replay.
  4. Explain why the group must be stopped for the reset to apply.

Next:Schema Registry Incompatibility in Production.