Read time: ~

Schema Registry and Data Contracts

Why long-lived topics need contracts, how Schema Registry stores subjects and versions, the wire format, Avro with Spring Kafka, and compatibility modes for safe evolution.

In Section 3 you serialized events as JSON with typed DTOs. That is enough for one team shipping producer and consumer together, but it does not scale to long-lived topics read by many services over months. A field rename by one team silently breaks another team’s consumer, and nothing stops the bad change from shipping.

Schema Registry fixes this by making the event a real contract. It stores each topic’s schema centrally, stamps every record with a schema id, and refuses to register a change that would break existing readers. This module upgrades the Order and Inventory services from JSON to Avro backed by the registry.


What you’ll be able to do after this module

  • Explain why long-lived, multi-team topics need a schema contract.
  • Describe subjects, versions, and the record wire format.
  • Serialize and deserialize Avro events with Spring for Apache Kafka.
  • Name where Protobuf and JSON Schema fit alongside Avro.
  • Choose a compatibility mode and evolve a schema without breaking consumers.

1. Why schemas matter

A topic often outlives the code that first wrote to it. New consumers appear, teams change, and the event shape needs to grow. Without a contract, three problems bite:

  • Decoupled deploys: the producer and each consumer ship on their own schedule, so a change must not require a coordinated release.
  • Multi-team ownership: several services read orders, and none of them should break because another team edited a DTO.
  • Long retention: a consumer replaying last month’s records must still be able to read them.

A schema registry solves all three by putting the contract in one place and enforcing rules on how it can change.


2. Add Schema Registry to the lab

Add a Schema Registry service to the docker-compose.yml from Local Lab, alongside the kafka and kafka-ui services.

  schema-registry:
    image: confluentinc/cp-schema-registry:7.7.0
    container_name: schema-registry
    depends_on:
      - kafka
    ports:
      - "8081:8081"
    environment:
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:29092
      SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081

The registry reaches the broker over the DOCKER listener at kafka:29092, the same internal address the Kafka UI uses. Apply it and confirm the registry is up:

docker compose up -d
curl http://localhost:8081/subjects

A fresh registry returns an empty list []. Your Spring apps, running on the host, will use http://localhost:8081.


3. Subjects, versions, and the wire format

The registry organizes schemas under subjects. With the default naming strategy, a topic named orders has a subject orders-value for its value schema (and orders-key if the key is schema-managed). Each subject holds an ordered list of versions, and every version has a globally unique schema id.

When a producer sends a record, the serializer does not write the whole schema into the record. It writes a small header plus the payload:

flowchart LR
    magic["magic byte 0x0"] --> id["4-byte schema id"] --> payload["Avro-encoded payload"]
  • Magic byte: a single 0x0 marking the Confluent wire format.
  • Schema id: four bytes identifying which registered schema was used.
  • Payload: the record serialized against that schema, with no field names repeated.

The consumer reads the schema id, fetches that schema from the registry (caching it), and uses it to decode the payload. This keeps records small while guaranteeing both sides agree on the shape.

sequenceDiagram
    participant P as Order Service
    participant R as Schema Registry
    participant K as Kafka (orders)
    participant C as Inventory Service

    P->>R: register orders-value schema (first send)
    R-->>P: schema id
    P->>K: record = magic + id + Avro payload
    K->>C: deliver record
    C->>R: fetch schema by id (cached after first)
    R-->>C: schema
    C->>C: decode payload to OrderCreated

4. Avro with Spring for Apache Kafka

Avro is the most common format with Schema Registry. You define the schema in an .avsc file, and the Avro tooling generates a Java class from it. Define OrderCreated to match the fields from First Producer and Consumer.

{
  "type": "record",
  "name": "OrderCreated",
  "namespace": "com.example.orders.events",
  "fields": [
    { "name": "orderId", "type": "long" },
    { "name": "item", "type": "string" },
    { "name": "amount", "type": "string" }
  ]
}

Add the Confluent serializer dependency and the Confluent Maven repository, since these artifacts are not on Maven Central.

<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-avro-serializer</artifactId>
    <version>7.7.0</version>
</dependency>
<repositories>
    <repository>
        <id>confluent</id>
        <url>https://packages.confluent.io/maven/</url>
    </repository>
</repositories>

Swap the JSON serializers from Section 3 for the Avro ones and point both apps at the registry.

spring:
  kafka:
    bootstrap-servers: localhost:9092
    properties:
      schema.registry.url: http://localhost:8081
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
    consumer:
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
      properties:
        specific.avro.reader: true

With specific.avro.reader: true, the deserializer builds the generated OrderCreated class rather than a generic record. The producing and consuming code is unchanged from Section 3, because the serializer handles registration and lookup transparently:

// Order service: subject orders-value auto-registers on the first send
kafkaTemplate.send("orders", String.valueOf(event.getOrderId()), event);

// Inventory service
@KafkaListener(topics = "orders", groupId = "inventory-service")
public void onOrderCreated(OrderCreated event) {
    // event.getItem() and event.getAmount() are strongly typed
}

5. Protobuf and JSON Schema

Avro is not the only option. Schema Registry supports three formats, and the choice is mostly about your ecosystem. You swap the serializer pair and keep everything else the same.

FormatSerializer / deserializerPick it when
AvroKafkaAvroSerializer / KafkaAvroDeserializerDefault for Kafka; compact, strong evolution rules
ProtobufKafkaProtobufSerializer / KafkaProtobufDeserializerYou already use gRPC / Protobuf across services
JSON SchemaKafkaJsonSchemaSerializer / KafkaJsonSchemaDeserializerYou want human-readable payloads with validation

All three use the same subjects, versions, and wire format, so the registry concepts you just learned carry over unchanged. The rest of this module uses Avro, but the compatibility rules below apply to every format.


6. Compatibility modes

The registry enforces a compatibility mode per subject. Before it accepts a new version, it checks the change against existing versions and rejects anything that would break a reader or writer.

ModeGuaranteesAllows
BACKWARD (default)New consumers can read old dataAdd optional fields, remove fields
FORWARDOld consumers can read new dataAdd fields, remove optional fields
FULLBoth directionsAdd or remove optional fields only
*_TRANSITIVESame, against all prior versionsSame, checked across the whole history

Backward compatibility is the common default because it lets you upgrade consumers first, then producers. The non-transitive modes only check against the latest version, while the transitive variants check against every version in the subject’s history.


7. Safe evolution

The rules follow directly from the compatibility mode. Under the default BACKWARD mode, a new consumer using the new schema must be able to read data written with the old schema.

flowchart TD
    change["Proposed schema change"]
    add["Add field with a default<br/>(e.g. currency = USD)"]
    remove["Remove a required field<br/>(e.g. item)"]
    ok["Registry accepts:<br/>old data still readable"]
    reject["Registry rejects at registration:<br/>old data would lack the field"]
    change --> add --> ok
    change --> remove --> reject

Evolving OrderCreated safely means adding an optional field with a default value:

{ "name": "currency", "type": "string", "default": "USD" }

A consumer on the new schema reading an old record without currency gets the default. A consumer on the old schema reading a new record simply ignores the extra field. Both directions work, so the change is safe.

The rules that keep you safe:

  • Add fields with a default, so old records deserialize under the new schema.
  • Never remove or rename a required field, which strands existing data. Use an Avro alias to rename compatibly.
  • Do not change a field’s type in an incompatible way (for example string to long).

8. Guided practical

Run this against the lab with Schema Registry added, using the Order and Inventory services from Section 3.

  1. Add the schema-registry service to compose and confirm curl http://localhost:8081/subjects returns [].
  2. Switch both services from the JSON serializers to the Avro ones and restart them.
  3. Produce one OrderCreated, then confirm the subject exists: curl http://localhost:8081/subjects shows orders-value.
  4. Read the registered schema: curl http://localhost:8081/subjects/orders-value/versions/1.
  5. Add the optional currency field with a default, produce again, and confirm a second version registered.
  6. Attempt a breaking change (remove item), produce, and confirm the registry rejects the registration.
  7. Check the compatibility mode: curl http://localhost:8081/config/orders-value (falls back to the global default if unset).

Next: Section 5, Delivery Guarantees, where you make delivery precise: at-most-once, at-least-once, and exactly-once, and the configuration behind each.