Poison Messages, Deserialization Errors, and the DLT
Diagnose SerializationException, un-deserializable records blocking a partition, and triage the dead letter topic in a Spring Boot consumer.
1. Symptom
One partition of orders stops making progress: lag climbs on that partition while others are fine, and the Payment consumer logs the same SerializationException or handler error over and over on the same offset. Either it is stuck retrying a record it can never process (a poison message), or such records are piling up in the dead letter topic (DLT).
The goal is to unblock the partition without losing good data, and to triage what landed in the DLT, building on Retry and Error Handling.
2. Likely causes
| Cause | How it manifests |
|---|---|
| Un-deserializable record (bad bytes, wrong type) | SerializationException on the same offset, partition stuck |
| Schema mismatch | Deserializer cannot resolve the record’s schema (see Schema Incompatibility) |
| Handler bug on specific data | Non-transient exception every time for one record |
Missing ErrorHandlingDeserializer | A deserialization failure kills the container instead of being routed |
A true poison message fails deterministically, so blocking retries never succeed and the partition halts behind it.
3. How it manifests to the Spring app
| Cause | What the service sees |
|---|---|
| Poison record, no error handler | Same offset retried forever; partition frozen; other partitions fine |
With DefaultErrorHandler + DLT | Record retried a fixed number of times, then published to .DLT and skipped |
Deserialization failure without ErrorHandlingDeserializer | Listener container stops; consumer appears dead |
4. Diagnostic steps
- Confirm the scope. Is lag isolated to one partition with a repeating error on one offset? That is the poison signature.
- Read the exception.
SerializationExceptionpoints at deserialization/schema; a handler exception points at app logic on that payload. - Inspect the offending record with the console consumer at that partition and offset (
--partition,--offset) to see the raw bytes/value. - Check error-handling config. Is
ErrorHandlingDeserializerin place, and is aDefaultErrorHandlerwith a DLT configured? Without them, one bad record blocks everything. - Check the DLT for accumulating records and read them to understand what failed.
| Step | Question it answers | Time cost |
|---|---|---|
| 1. Scope | Poison message vs broad failure? | 1 min |
| 2. Exception | Deserialization or handler bug? | 1 min |
| 3. Inspect record | What is actually wrong? | 2-3 min |
| 4. Error config | Is bad data routed or blocking? | 1-2 min |
| 5. DLT contents | What has been dead-lettered? | 2-3 min |
5. Safe remediations
| Situation | Safe action |
|---|---|
| No error handler, partition blocked | Deploy ErrorHandlingDeserializer + DefaultErrorHandler with a DLT so bad records route instead of blocking |
| Poison record blocking now, fix not yet deployed | With sign-off, skip the single offset (seek past it) as a last resort; document the skipped record |
| Records in DLT | Triage: fix the producer/schema, then optionally replay valid ones from the DLT |
| Schema mismatch | Follow Schema Incompatibility |
6. Escalation trigger
Page on-call engineering if:
- Unblocking requires skipping offsets on a production partition without a DLT in place.
- The poison message is a symptom of a schema-registry incompatibility affecting many records.
- The DLT is filling rapidly, indicating a systemic producer or contract problem, not one bad record.
7. Relevant commands and exhibits
# Poison signature: same offset, same error, repeating
Error deserializing key/value for partition orders-1 at offset 4471
org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
# Inspect the exact offending record
kafka-console-consumer.sh --bootstrap-server $BROKER --topic orders \
--partition 1 --offset 4471 --max-messages 1 --property print.key=true
# Read the dead letter topic
kafka-console-consumer.sh --bootstrap-server $BROKER --topic orders.DLT \
--from-beginning --property print.key=true
// The config that turns a poison message from a partition-blocker into a DLT record
@Bean
DefaultErrorHandler errorHandler(KafkaTemplate<Object, Object> template) {
var recoverer = new DeadLetterPublishingRecoverer(template);
return new DefaultErrorHandler(recoverer, new FixedBackOff(1000L, 3));
}
8. Guided practical
Reproduce a poison message in the local lab.
- Configure the Payment consumer with
ErrorHandlingDeserializerand aDefaultErrorHandler+DeadLetterPublishingRecovereras in Retry and Error Handling. - Produce one malformed record to
orderswith the console producer (plain text where JSON/Avro is expected). - Watch it fail, retry the configured number of times, then land in
orders.DLT, while the good records keep flowing. - Read the DLT to inspect the failed record.