Read time: ~

Idempotent Consumers, Ordering, and Duplicates

Why at-least-once forces idempotent consumers, dedup strategies with an inbox table, per-partition ordering, and how key choice and rebalances create duplicates.

Kafka’s practical default is at-least-once, which means a record can be delivered more than once. Even the exactly-once semantics from Transactions and Exactly-Once Semantics only cover Kafka-to-Kafka work, so any consumer with an external side effect, like the Payment service charging a card, must assume duplicates will happen. The only robust defense is an idempotent consumer: one that produces the same result whether it sees a record once or five times.


What you’ll be able to do after this module

  • Explain why at-least-once forces idempotent consumers, even with EOS elsewhere.
  • Deduplicate with an idempotency key and an inbox table.
  • State Kafka’s ordering guarantee precisely and its consequences.
  • Explain how key choice and partition count affect ordering.
  • Explain how retries and rebalances create duplicate processing.

1. Why duplicates are guaranteed

A duplicate is not an edge case, it is the contract. Any of these produce one:

  • A consumer processes a record, then crashes before committing the offset. On restart the record is redelivered.
  • A rebalance reassigns a partition after processing but before the commit, so the new owner reprocesses.
  • A producer retry that predates idempotence, or an upstream at-least-once producer, writes the same event twice.

Because the side effect (a card charge) lives outside Kafka, no broker feature can undo it. The consumer itself must recognize a record it has already handled and skip the effect.


2. The idempotency key

Idempotency needs a stable, unique identifier for the unit of work, one that is the same every time the event is delivered. In the running scenario the natural key is the business id, such as the orderId on an OrderCreated, or a dedicated event id carried in the payload or a header.

Do not use anything that changes per delivery, such as the offset or a timestamp, because a redelivery would look new. The key must come from the event’s identity, not its delivery.

sequenceDiagram
    participant K as Kafka (orders)
    participant C as Payment consumer
    participant DB as Inbox table

    K->>C: OrderCreated(orderId=42)
    C->>DB: insert orderId=42 (new)
    DB-->>C: inserted
    C->>C: charge card, publish PaymentSucceeded
    Note over K,C: later, redelivery of the same record
    K->>C: OrderCreated(orderId=42)
    C->>DB: insert orderId=42
    DB-->>C: duplicate key, already processed
    C->>C: no-op, skip the charge

3. Dedup with an inbox table

The most reliable strategy stores processed keys in the same database as the side effect, in one transaction. This is the inbox pattern: record the key and do the work atomically, so a crash cannot leave one without the other.

@KafkaListener(topics = "orders", groupId = "payment-service")
@Transactional
public void onOrderCreated(OrderCreated event) {
    if (inboxRepository.existsById(event.orderId())) {
        return; // already processed, skip the side effect
    }
    inboxRepository.save(new InboxEntry(event.orderId(), Instant.now()));
    paymentService.charge(event);          // external side effect
    // commit inserts the key and the charge together
}

Two details make this correct. First, the existence check and the insert must be in the same transaction as the work, so two concurrent redeliveries cannot both pass the check. Relying on the primary key constraint to reject the second insert is the safest form. Second, the inbox table needs periodic cleanup of old keys, bounded to at least your topic retention window.


4. Ordering is per partition only

Kafka guarantees order within a partition, and nowhere else. Two records on different partitions have no ordering relationship, even on the same topic. This has direct consequences for a consumer:

  • Records with the same key go to the same partition, so per-key order is preserved.
  • Records with different keys may be processed in any relative order, including in parallel by different consumer threads.

So if the Order service must deliver OrderCreated before OrderConfirmed for the same order, both must share a key (the orderId) so they land on the same partition in order.

flowchart TD
    subgraph t [orders topic]
        p0["partition 0<br/>order-42: Created, Confirmed"]
        p1["partition 1<br/>order-77: Created, Confirmed"]
    end
    p0 --> note0["order-42 events in order"]
    p1 --> note1["order-77 events in order"]

5. Key choice and partition count

Because ordering follows the key, key choice is a correctness decision, not just a load-balancing one.

  • Key by orderId: all events for one order are ordered; different orders spread across partitions for parallelism.
  • Key by customerId: all of a customer’s events are ordered, at the cost of a busy customer creating a hot partition.
  • No key: maximum spread, no ordering guarantees at all.

Partition count interacts with this. As noted in Producing Deeper, changing the partition count later remaps keys to partitions, so events for a key can split across the old and new partition and lose their relative order. Decide the key and the partition count together, up front.


6. Retries, rebalances, and duplicates

Two mechanisms from earlier modules are the common duplicate sources, and the idempotent consumer neutralizes both:

In both cases the inbox check turns the second processing into a no-op, which is exactly the goal.


7. Guided practical

Run this against the local lab with a database for the Payment service.

  1. Create an inbox table keyed by orderId, and implement the transactional listener above.
  2. Produce one OrderCreated and confirm the charge happens once and the key is stored.
  3. Produce the same OrderCreated again (same orderId) and confirm the handler skips the charge.
  4. Kill the consumer mid-processing before the commit, restart it, and confirm the redelivered record does not double-charge.
  5. Produce OrderCreated and OrderConfirmed for one order with the same key and confirm they are processed in order.

Next:Rebalancing and Consumer Group Stability, the last reliability module, where you keep consumer groups stable and avoid rebalance storms.