Phase 1 (blueprint only). No article content is generated until you approve this blueprint. This document defines the full course shape: audience, design principles, the section-by-section table of contents with permalinks and per-module scope, the running scenario, the file layout, and the scaffolding changes. It is comprehensive but non-redundant: each concept lives on exactly one page and is cross-linked from everywhere else.
The companion documents are:
release/kafka/implementation-plan.md: the phased, checkbox-tracked build plan.release/kafka/generation-prompt.md: the reusable prompt used to generate each module to a consistent quality bar.course-callout includes for prerequisites and notes, Mermaid diagrams for flows, and prose that reads correctly even without the diagram. Lint with ./scripts/lint-course-content.sh _courses/kafka/_topics.layout: web/course-reading, front matter with heading, title, subtitles, permalink, and seo. Topic files live under _courses/kafka/_topics/NN-slug.md; nav ordering and prev/next come from the nav YAML, not filenames.flowchart TD
s0[0. Kafka and Messaging Foundations]
s1[1. Architecture and Internals]
s2[2. Local Lab and Setup]
s3[3. Building with Spring for Apache Kafka]
s4[4. Schema Management]
s5[5. Reliability and Delivery Semantics]
s6[6. Event-Driven Architecture and Advanced Patterns]
s7[7. Production Readiness]
s8[8. Operations and Troubleshooting]
s9[9. Capstone and Assessment]
s0 --> s1 --> s2 --> s3 --> s4 --> s5 --> s6 --> s7 --> s8 --> s9
Sections 0 to 6 build the developer skill set from beginner to advanced. Section 7 hardens for production. Section 8 is the standalone operations and troubleshooting reference. Section 9 ties everything together.
/kafka/why-kafka) [NEW] sequenceDiagrams (REST call chain vs publish-and-forget), flowchart LR of producer, topic, consumer group./kafka/mental-model) [NEW] flowchart LR of a partitioned log with offsets; flowchart TD of two consumer groups reading the same topic independently./kafka/core-concepts) [NEW] flowchart of cluster-topic-partition hierarchy; glossary table./kafka/cluster-anatomy) [NEW] min.insync.replicas, and unclean leader election. How partitions are distributed and rebalanced across brokers. Diagram: flowchart TD of a 3-broker cluster with leader/follower replicas; sequenceDiagram of a leader failover./kafka/control-plane) [NEW] flowchart of ZooKeeper-based metadata vs KRaft quorum; sequenceDiagram of controller election in each model./kafka/storage-internals) [NEW] .log/.index/.timeindex files, the active segment, retention (time and size), log compaction (tombstones, compacted vs delete cleanup policy), the OS page cache, and zero-copy transfer. Why Kafka is fast. When to use compaction (changelog/KTable topics). Diagram: flowchart LR of a segmented partition; flowchart TD of compaction before/after./kafka/local-lab) [NEW] advertised.listeners explained (a top beginner footgun). Create/describe/alter/delete topics, and produce/consume from the console. Diagram: flowchart LR of the local dev topology (brokers, UI, your Spring app). Uses the course-checkpoint include for a hands-on checklist./kafka/spring-getting-started) [NEW, serialization merged in] spring-kafka, KafkaTemplate, @KafkaListener, ProducerFactory/ConsumerFactory, auto-configuration, and topic creation with NewTopic/KafkaAdmin. Serialization end to end: StringSerializer, JsonSerializer/JsonDeserializer with typed DTOs, trusted packages, and the ErrorHandlingDeserializer (advanced schema formats deferred to Section 4). Diagram: sequenceDiagram of an end-to-end produce-consume round trip with the Order service./kafka/producer-internals) [NEW] acks (0/1/all), batching (linger.ms, batch.size), compression, ProducerRecord headers, and send callbacks vs blocking get(). Diagram: flowchart TD of the producer send path (accumulator, batches, partitioner)./kafka/consumer-groups) [NEW] AckMode in Spring), enable.auto.commit trade-offs, auto.offset.reset (earliest/latest), concurrency and the container concurrency setting mapped to partitions, and seeking. Canonical home for offset mechanics. Diagram: sequenceDiagram of poll-process-commit; flowchart of partitions assigned across a 3-consumer group./kafka/schema-registry) [NEW] sequenceDiagram of producer/consumer resolving schemas via the registry; flowchart TD of a safe vs breaking schema change./kafka/delivery-guarantees) [NEW] flowchart TD decision guide mapping requirements to a configuration./kafka/reliable-producer) [NEW] acks=all with min.insync.replicas, the idempotent producer (enable.idempotence, producer id, sequence numbers, why it prevents duplicates on retry), retries, delivery.timeout.ms, and max.in.flight ordering caveats. Diagram: sequenceDiagram of a retried send with idempotence preventing a duplicate./kafka/transactions-eos) [NEW, advanced] transactional.id, KafkaTransactionManager and @Transactional in Spring, sendOffsetsToTransaction, isolation.level=read_committed, transaction markers, and fencing of zombie producers. Where EOS genuinely helps and where it does not (external side effects). Diagram: sequenceDiagram of a transactional consume-transform-produce with offset commit inside the transaction./kafka/retry-error-handling) [NEW] DefaultErrorHandler, blocking vs non-blocking retries, @RetryableTopic with retry topics and a dead letter topic (DLT), backoff, classifying transient vs deterministic (poison) failures, and the ErrorHandlingDeserializer for un-deserializable records. Diagram: flowchart LR of main topic to retry topics to DLT; flowchart TD retry-then-DLT decision tree./kafka/idempotency-ordering) [NEW] sequenceDiagram of a redelivery and an idempotent handler no-op./kafka/rebalancing) [NEW] group.instance.id), session.timeout.ms/heartbeat.interval.ms/max.poll.interval.ms, and how a slow handler causes a rebalance storm. Diagram: sequenceDiagram of a rebalance; flowchart of cooperative vs eager reassignment./kafka/event-driven-architecture) [NEW] flowchart TD of the full event topology across the four services./kafka/outbox-cdc) [NEW] flowchart of the outbox + CDC pipeline; sequenceDiagram of a saga with a compensating action./kafka/kafka-streams) [NEW, detailed, advanced] KStream and KTable, map/filter/flatMap, groupByKey/aggregate/count, the stream-table duality, state stores and changelog topics, windowing (tumbling, hopping, session), stream-stream and stream-table joins, exactly-once in Streams, and interactive queries. Spring integration with spring-kafka-streams and StreamsBuilderFactoryBean. A worked example: a running revenue aggregation over the Payment stream. Diagrams: flowchart LR of a topology; flowchart TD of windowed aggregation into a state store and changelog./kafka/kafka-connect) [NEW] flowchart LR of a source connector to Kafka to a sink connector./kafka/security) [NEW] flowchart of the client authentication and authorization flow./kafka/observability) [NEW] kafka-consumer-groups lag, and distributed tracing with OpenTelemetry across producer and consumer. Diagram: flowchart LR of the metrics and trace pipeline./kafka/performance) [NEW] fetch.min.bytes/fetch.max.wait.ms, consumer max.poll.records, replication and durability trade-offs, and how to reason about throughput vs latency. References (does not repeat) the producer/consumer internals modules. Diagram: flowchart TD tuning decision guide./kafka/testing) [NEW] EmbeddedKafka vs Testcontainers Kafka, testing producers and listeners, asserting on async consumers with Awaitility (no Thread.sleep), testing serialization and schema compatibility, and testing Kafka Streams with TopologyTestDriver. Diagram: flowchart of the test setup./kafka/msk-architecture) [NEW] min.insync.replicas for AZ resilience, networking (VPC, subnets, security groups), storage (EBS), and the MSK vs self-managed trade-off. A concise self-managed note (KRaft on EC2/K8s). Diagram: flowchart TD of a multi-AZ MSK cluster./kafka/tooling-walkthrough) [NEW] kafka-topics, kafka-consumer-groups (describe lag, reset offsets), kafka-configs, kafka-console-producer/consumer, kafka-reassign-partitions, and the CloudWatch metrics reference for MSK. How to read each output. Diagram: table-driven reference plus a flowchart of “which tool for which question.”/kafka/alert-playbooks/*) [NEW] /kafka/alert-playbooks/consumer-lag): lag climbing, zero consumers, slow handler, max.poll.interval.ms breaches./kafka/alert-playbooks/under-replicated-partitions): ISR shrink, UnderReplicatedPartitions, OfflinePartitionsCount./kafka/alert-playbooks/broker-controller-failover): node loss, controller re-election, KRaft metadata quorum below majority./kafka/alert-playbooks/rebalance-storms): constant rebalances, max.poll.interval.ms, cooperative rebalancing, static membership./kafka/alert-playbooks/producer-failures): NotEnoughReplicasException, TimeoutException, buffer exhaustion, RecordTooLargeException./kafka/alert-playbooks/poison-messages-dlt): SerializationException, un-deserializable records blocking a partition, DLT triage./kafka/alert-playbooks/disk-retention): broker disk full, retention misconfiguration, log dir failure./kafka/alert-playbooks/auth-failures-rotation): SASL/SCRAM password rotation, MSK IAM policy changes, expired TLS certs, stale cached credentials on reconnect./kafka/alert-playbooks/aws-layer-connectivity): security groups, NACLs, EBS throughput/IOPS, EC2 status checks beneath broker symptoms (exhibit-based, mirrors the RabbitMQ course playbook style)./kafka/alert-playbooks/latency-ordering-duplicates): GC pauses, broker-wide slowness, out-of-order symptoms, duplicate processing after a rebalance./kafka/alert-playbooks/offset-problems): OffsetOutOfRangeException, unwanted auto.offset.reset replays, __consumer_offsets issues, safe offset resets./kafka/alert-playbooks/schema-incompatibility): a breaking schema change rejected or crashing consumers, compatibility-mode mismatches, rollout order./kafka/hands-on-lab) [NEW] flowchart TD of the diagnostic decision tree./kafka/escalation-communication) [NEW] /kafka/cheat-sheet) [NEW] /kafka/capstone) [NEW] flowchart TD of the final reference architecture./kafka/assessment) [NEW] A four-service event-driven system used consistently in examples, diagrams, and the capstone:
OrderCreated and OrderConfirmed events.OrderCreated, charges the customer, publishes PaymentSucceeded / PaymentFailed. Motivates exactly-once and transactions.PaymentSucceeded, reserves stock, publishes StockReserved / StockRejected. Motivates idempotency.Topic keys are order id or customer id depending on the ordering requirement, which is used to teach partitioning and ordering concretely.
All content is new; nothing is migrated. Files live under _courses/kafka/:
_courses/kafka/index.md: course landing page (overview, promise callout, 10-section table, who-this-is-for, start CTA)._courses/kafka/_topics/NN-slug.md: one file per module, numbered 01 to 33 in section order (see implementation-plan.md for the full number map)._courses/kafka/_topics/28-alert-playbooks/NN-slug.md: the twelve playbook subpages plus an overview, mirroring the RabbitMQ 23-alert-playbooks/ nested layout.Routing uses front-matter permalink, and cross-links use permalinks, so file prefixes are for ordering only and are safe to adjust.
kafka entry to _data/courses/course_list.yml (key: kafka, id: kafka, title: Apache Kafka, paths: [kafka], submenu_file: kafka, category: messaging, with an icon under assets/icons/kafka/).title, release: prod, url: /kafka/). Registering in course_list.yml alone does not add the course to the main nav.release: draft and flip to release: prod as each ships (the pattern the RabbitMQ nav uses).course-callout, course-start-cta).assets/icons/kafka/ for the registry and cards../scripts/lint-course-content.sh _courses/kafka/_topics after each phase.bundle exec jekyll build and verify nav, active highlighting, and Mermaid rendering.On approval, generate files sequentially, one module at a time, each with its Mermaid diagram(s), realistic Order/Payment/Inventory/Notification Spring Boot code, prerequisites, guided practical, cross-links, and checkpoint, updating the nav YAML and index as we go. Order of build is defined in release/kafka/implementation-plan.md, and every module is generated with release/kafka/generation-prompt.md to hold the quality bar constant.