Schema Registry and Data Contracts
Why long-lived topics need contracts, how Schema Registry stores subjects and versions, the wire format, Avro with Spring Kafka, and compatibility modes for safe evolution.
In Section 3 you serialized events as JSON with typed DTOs. That is enough for one team shipping producer and consumer together, but it does not scale to long-lived topics read by many services over months. A field rename by one team silently breaks another team’s consumer, and nothing stops the bad change from shipping.
Schema Registry fixes this by making the event a real contract. It stores each topic’s schema centrally, stamps every record with a schema id, and refuses to register a change that would break existing readers. This module upgrades the Order and Inventory services from JSON to Avro backed by the registry.
What you’ll be able to do after this module
- Explain why long-lived, multi-team topics need a schema contract.
- Describe subjects, versions, and the record wire format.
- Serialize and deserialize Avro events with Spring for Apache Kafka.
- Name where Protobuf and JSON Schema fit alongside Avro.
- Choose a compatibility mode and evolve a schema without breaking consumers.
1. Why schemas matter
A topic often outlives the code that first wrote to it. New consumers appear, teams change, and the event shape needs to grow. Without a contract, three problems bite:
- Decoupled deploys: the producer and each consumer ship on their own schedule, so a change must not require a coordinated release.
- Multi-team ownership: several services read
orders, and none of them should break because another team edited a DTO. - Long retention: a consumer replaying last month’s records must still be able to read them.
A schema registry solves all three by putting the contract in one place and enforcing rules on how it can change.
2. Add Schema Registry to the lab
Add a Schema Registry service to the docker-compose.yml from Local Lab, alongside the kafka and kafka-ui services.
schema-registry:
image: confluentinc/cp-schema-registry:7.7.0
container_name: schema-registry
depends_on:
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:29092
SCHEMA_REGISTRY_LISTENERS: http://0.0.0.0:8081
The registry reaches the broker over the DOCKER listener at kafka:29092, the same internal address the Kafka UI uses. Apply it and confirm the registry is up:
docker compose up -d
curl http://localhost:8081/subjects
A fresh registry returns an empty list []. Your Spring apps, running on the host, will use http://localhost:8081.
3. Subjects, versions, and the wire format
The registry organizes schemas under subjects. With the default naming strategy, a topic named orders has a subject orders-value for its value schema (and orders-key if the key is schema-managed). Each subject holds an ordered list of versions, and every version has a globally unique schema id.
When a producer sends a record, the serializer does not write the whole schema into the record. It writes a small header plus the payload:
flowchart LR
magic["magic byte 0x0"] --> id["4-byte schema id"] --> payload["Avro-encoded payload"]
- Magic byte: a single
0x0marking the Confluent wire format. - Schema id: four bytes identifying which registered schema was used.
- Payload: the record serialized against that schema, with no field names repeated.
The consumer reads the schema id, fetches that schema from the registry (caching it), and uses it to decode the payload. This keeps records small while guaranteeing both sides agree on the shape.
sequenceDiagram
participant P as Order Service
participant R as Schema Registry
participant K as Kafka (orders)
participant C as Inventory Service
P->>R: register orders-value schema (first send)
R-->>P: schema id
P->>K: record = magic + id + Avro payload
K->>C: deliver record
C->>R: fetch schema by id (cached after first)
R-->>C: schema
C->>C: decode payload to OrderCreated
4. Avro with Spring for Apache Kafka
Avro is the most common format with Schema Registry. You define the schema in an .avsc file, and the Avro tooling generates a Java class from it. Define OrderCreated to match the fields from First Producer and Consumer.
{
"type": "record",
"name": "OrderCreated",
"namespace": "com.example.orders.events",
"fields": [
{ "name": "orderId", "type": "long" },
{ "name": "item", "type": "string" },
{ "name": "amount", "type": "string" }
]
}
Add the Confluent serializer dependency and the Confluent Maven repository, since these artifacts are not on Maven Central.
<dependency>
<groupId>io.confluent</groupId>
<artifactId>kafka-avro-serializer</artifactId>
<version>7.7.0</version>
</dependency>
<repositories>
<repository>
<id>confluent</id>
<url>https://packages.confluent.io/maven/</url>
</repository>
</repositories>
Swap the JSON serializers from Section 3 for the Avro ones and point both apps at the registry.
spring:
kafka:
bootstrap-servers: localhost:9092
properties:
schema.registry.url: http://localhost:8081
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
consumer:
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: io.confluent.kafka.serializers.KafkaAvroDeserializer
properties:
specific.avro.reader: true
With specific.avro.reader: true, the deserializer builds the generated OrderCreated class rather than a generic record. The producing and consuming code is unchanged from Section 3, because the serializer handles registration and lookup transparently:
// Order service: subject orders-value auto-registers on the first send
kafkaTemplate.send("orders", String.valueOf(event.getOrderId()), event);
// Inventory service
@KafkaListener(topics = "orders", groupId = "inventory-service")
public void onOrderCreated(OrderCreated event) {
// event.getItem() and event.getAmount() are strongly typed
}
5. Protobuf and JSON Schema
Avro is not the only option. Schema Registry supports three formats, and the choice is mostly about your ecosystem. You swap the serializer pair and keep everything else the same.
| Format | Serializer / deserializer | Pick it when |
|---|---|---|
| Avro | KafkaAvroSerializer / KafkaAvroDeserializer | Default for Kafka; compact, strong evolution rules |
| Protobuf | KafkaProtobufSerializer / KafkaProtobufDeserializer | You already use gRPC / Protobuf across services |
| JSON Schema | KafkaJsonSchemaSerializer / KafkaJsonSchemaDeserializer | You want human-readable payloads with validation |
All three use the same subjects, versions, and wire format, so the registry concepts you just learned carry over unchanged. The rest of this module uses Avro, but the compatibility rules below apply to every format.
6. Compatibility modes
The registry enforces a compatibility mode per subject. Before it accepts a new version, it checks the change against existing versions and rejects anything that would break a reader or writer.
| Mode | Guarantees | Allows |
|---|---|---|
BACKWARD (default) | New consumers can read old data | Add optional fields, remove fields |
FORWARD | Old consumers can read new data | Add fields, remove optional fields |
FULL | Both directions | Add or remove optional fields only |
*_TRANSITIVE | Same, against all prior versions | Same, checked across the whole history |
Backward compatibility is the common default because it lets you upgrade consumers first, then producers. The non-transitive modes only check against the latest version, while the transitive variants check against every version in the subject’s history.
7. Safe evolution
The rules follow directly from the compatibility mode. Under the default BACKWARD mode, a new consumer using the new schema must be able to read data written with the old schema.
flowchart TD
change["Proposed schema change"]
add["Add field with a default<br/>(e.g. currency = USD)"]
remove["Remove a required field<br/>(e.g. item)"]
ok["Registry accepts:<br/>old data still readable"]
reject["Registry rejects at registration:<br/>old data would lack the field"]
change --> add --> ok
change --> remove --> reject
Evolving OrderCreated safely means adding an optional field with a default value:
{ "name": "currency", "type": "string", "default": "USD" }
A consumer on the new schema reading an old record without currency gets the default. A consumer on the old schema reading a new record simply ignores the extra field. Both directions work, so the change is safe.
The rules that keep you safe:
- Add fields with a default, so old records deserialize under the new schema.
- Never remove or rename a required field, which strands existing data. Use an Avro
aliasto rename compatibly. - Do not change a field’s type in an incompatible way (for example
stringtolong).
8. Guided practical
Run this against the lab with Schema Registry added, using the Order and Inventory services from Section 3.
- Add the
schema-registryservice to compose and confirmcurl http://localhost:8081/subjectsreturns[]. - Switch both services from the JSON serializers to the Avro ones and restart them.
- Produce one
OrderCreated, then confirm the subject exists:curl http://localhost:8081/subjectsshowsorders-value. - Read the registered schema:
curl http://localhost:8081/subjects/orders-value/versions/1. - Add the optional
currencyfield with a default, produce again, and confirm a second version registered. - Attempt a breaking change (remove
item), produce, and confirm the registry rejects the registration. - Check the compatibility mode:
curl http://localhost:8081/config/orders-value(falls back to the global default if unset).
Next: Section 5, Delivery Guarantees, where you make delivery precise: at-most-once, at-least-once, and exactly-once, and the configuration behind each.