Introduction to Kafka

How do we build a system where data can be written once, stored durably, and consumed many times, independently, and at massive scale?

Kafka is a distributed event streaming platform.

Chapter 1. The Fundamental Problem Kafka Solves

Modern systems are not a single program — they are many independent services that must continuously exchange data.

At first principles level, distributed systems struggle with three core challenges:

Problem Why It’s Hard
Data sharing Services need the same data at different times
Decoupling Producers and consumers shouldn’t depend on each other’s availability
Scale Data volume and traffic grow unpredictably

Traditional solutions (REST calls, shared databases, cron jobs) break down because they:

  • Create tight coupling
  • Lose events during failures
  • Can’t scale write/read throughput independently

Kafka exists to provide a durable, scalable, decoupled way to move data between systems.

Chapter 2. Design Philosophy

2.1 An Append-Only Log

Imagine the most primitive possible system: A file where you only append new records, never update or delete.

This simple structure gives powerful guarantees:

  • Writing is fast (sequential disk I/O)
  • Ordering is preserved
  • History is retained
  • Multiple readers can read independently

Kafka is fundamentally: A distributed, replicated, fault-tolerant commit log

2.2 From Log to Distributed System

A single log file is not enough. We need:

  • Fault tolerance
  • Parallel reads/writes
  • Massive scale

So Kafka evolves the simple log into a distributed log system. Instead of one giant log, Kafka splits it into many smaller logs called partitions. But additionally also does replication among different machines.

Chapter 3. Core Building Blocks

3.1 Topics — Logical Data Streams

A topic is just a named stream of events. Can also be thought as channels.

Examples:

  • user-signups
  • payments
  • click-events

Producers write to topics. Consumers read from topics.

3.2 Partitions — Parallelism + Scale

A topic is divided into partitions, each an independent append-only log.

Why?

Need Solution
Higher throughput Multiple partitions allow parallel writes
Scalability Spread partitions across machines
Ordering Preserved within a partition

Tradeoff: Kafka guarantees ordering per partition, not across the entire topic.

3.3 Brokers — The Kafka Servers

A broker is a Kafka server that stores partitions.

A Kafka cluster = many brokers.

Each partition lives on one broker as leader and is replicated to others as followers.

3.4 Replication — Surviving Failures

Kafka copies each partition to multiple brokers.

Concept Purpose
Leader replica Handles all reads/writes
Follower replicas Stay in sync, take over if leader dies

If a broker crashes → another replica becomes leader → system continues.

Chapter 4. Producers — Writing to the Log

A producer sends messages to Kafka.

Key first-principle idea:

  • Producers don’t talk to consumers directly. They just append to a shared log.

How partitioning works

  • A producer decides which partition a message goes to by:

  • Hashing a key (e.g., user_id)

    • Round robin
    • Custom logic

This enables Load balancing

Chapter 5. Consumers — Reading Independently

Consumers don’t receive pushes. They pull proactively from Kafka.

Each message has an offset — its position in the partition log.

Consumers track their own offsets: | Benefit | Explanation | |—|—| | Replay data | Reset offset to reprocess history | | Independent speeds | Slow consumer doesn’t block fast one | |Fault recovery | Resume from last committed offset |

Kafka stores data — it does not track who has read it (except offsets stored by consumer groups). Kafka introduces the concept of a consumer group. A consumer group is a set of consumers cooperating to read a topic.

Key rule: Each partition is assigned to exactly one consumer within a group. A consumer does not “choose” a partition manually. Kafka assigns partitions to consumers in a group automatically.

Chapter 6. Consumer Groups — Parallel Processing

If multiple instances of a service read a topic:

  • Kafka assigns partitions across them
  • Each partition is consumed by only one consumer in the group

This gives horizontal scalability: More partitions → more consumers → more throughput

Chapter 7. Durability and Delivery Guarantees

Kafka allows tuning reliability vs performance.

Acknowledgment levels | Setting | Meaning | |—|—| | acks=0 | Fire and forget | | acks=1 | Leader confirms write | | acks=all | All replicas confirm write |

Delivery semantics | Type | What It Means | |—|—| |At most once | No retries → possible data loss | | At least once | Retries → possible duplicates | |Exactly once | Idempotent producers + transactional writes |

Exactly-once requires coordination between producer, Kafka, and consumer logic.

Chapter 8. Retention — Kafka Is Not Just a Queue

Traditional queues delete messages after consumption. Kafka retains data for a configured time or size, regardless of consumption.

This enables:

  • Reprocessing history
  • Auditing
  • Bootstrapping new services from old data

Kafka becomes a source of truth for event history, not just a transport layer.

Chapter 9. When Kafka Is the Right Tool

Kafka excels when you need:

  • High-throughput event ingestion
  • Decoupled microservices
  • Event sourcing
  • Real-time analytics pipelines
  • Durable data streaming

Chapter 10. When Kafka Is NOT Ideal

Kafka is not a replacement for:

Use Case Better Tool
Low-latency RPC REST/gRPC
Simple task queues RabbitMQ/SQS
Strong relational queries SQL database

Kafka is optimized for streaming logs, not transactional queries.

Written on February 8, 2026