Read time: ~

Poison Messages, Deserialization Errors, and the DLT

Diagnose SerializationException, un-deserializable records blocking a partition, and triage the dead letter topic in a Spring Boot consumer.


1. Symptom

One partition of orders stops making progress: lag climbs on that partition while others are fine, and the Payment consumer logs the same SerializationException or handler error over and over on the same offset. Either it is stuck retrying a record it can never process (a poison message), or such records are piling up in the dead letter topic (DLT).

The goal is to unblock the partition without losing good data, and to triage what landed in the DLT, building on Retry and Error Handling.


2. Likely causes

CauseHow it manifests
Un-deserializable record (bad bytes, wrong type)SerializationException on the same offset, partition stuck
Schema mismatchDeserializer cannot resolve the record’s schema (see Schema Incompatibility)
Handler bug on specific dataNon-transient exception every time for one record
Missing ErrorHandlingDeserializerA deserialization failure kills the container instead of being routed

A true poison message fails deterministically, so blocking retries never succeed and the partition halts behind it.


3. How it manifests to the Spring app

CauseWhat the service sees
Poison record, no error handlerSame offset retried forever; partition frozen; other partitions fine
With DefaultErrorHandler + DLTRecord retried a fixed number of times, then published to .DLT and skipped
Deserialization failure without ErrorHandlingDeserializerListener container stops; consumer appears dead

4. Diagnostic steps

  1. Confirm the scope. Is lag isolated to one partition with a repeating error on one offset? That is the poison signature.
  2. Read the exception.SerializationException points at deserialization/schema; a handler exception points at app logic on that payload.
  3. Inspect the offending record with the console consumer at that partition and offset (--partition, --offset) to see the raw bytes/value.
  4. Check error-handling config. Is ErrorHandlingDeserializer in place, and is a DefaultErrorHandler with a DLT configured? Without them, one bad record blocks everything.
  5. Check the DLT for accumulating records and read them to understand what failed.
StepQuestion it answersTime cost
1. ScopePoison message vs broad failure?1 min
2. ExceptionDeserialization or handler bug?1 min
3. Inspect recordWhat is actually wrong?2-3 min
4. Error configIs bad data routed or blocking?1-2 min
5. DLT contentsWhat has been dead-lettered?2-3 min

5. Safe remediations

SituationSafe action
No error handler, partition blockedDeploy ErrorHandlingDeserializer + DefaultErrorHandler with a DLT so bad records route instead of blocking
Poison record blocking now, fix not yet deployedWith sign-off, skip the single offset (seek past it) as a last resort; document the skipped record
Records in DLTTriage: fix the producer/schema, then optionally replay valid ones from the DLT
Schema mismatchFollow Schema Incompatibility

6. Escalation trigger

Page on-call engineering if:

  • Unblocking requires skipping offsets on a production partition without a DLT in place.
  • The poison message is a symptom of a schema-registry incompatibility affecting many records.
  • The DLT is filling rapidly, indicating a systemic producer or contract problem, not one bad record.

7. Relevant commands and exhibits

# Poison signature: same offset, same error, repeating
Error deserializing key/value for partition orders-1 at offset 4471
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
# Inspect the exact offending record
kafka-console-consumer.sh --bootstrap-server $BROKER --topic orders \
  --partition 1 --offset 4471 --max-messages 1 --property print.key=true

# Read the dead letter topic
kafka-console-consumer.sh --bootstrap-server $BROKER --topic orders.DLT \
  --from-beginning --property print.key=true
// The config that turns a poison message from a partition-blocker into a DLT record
@Bean
DefaultErrorHandler errorHandler(KafkaTemplate<Object, Object> template) {
    var recoverer = new DeadLetterPublishingRecoverer(template);
    return new DefaultErrorHandler(recoverer, new FixedBackOff(1000L, 3));
}

8. Guided practical

Reproduce a poison message in the local lab.

  1. Configure the Payment consumer with ErrorHandlingDeserializer and a DefaultErrorHandler + DeadLetterPublishingRecoverer as in Retry and Error Handling.
  2. Produce one malformed record to orders with the console producer (plain text where JSON/Avro is expected).
  3. Watch it fail, retry the configured number of times, then land in orders.DLT, while the good records keep flowing.
  4. Read the DLT to inspect the failed record.

Next:Disk Pressure, Retention, and Segment Issues.