Introduction to Kafka
How do we build a system where data can be written once, stored durably, and consumed many times, independently, and at massive scale?
Kafka is a distributed event streaming platform.
Chapter 1. The Fundamental Problem Kafka Solves
Modern systems are not a single program — they are many independent services that must continuously exchange data.
At first principles level, distributed systems struggle with three core challenges:
| Problem | Why It’s Hard |
|---|---|
| Data sharing | Services need the same data at different times |
| Decoupling | Producers and consumers shouldn’t depend on each other’s availability |
| Scale | Data volume and traffic grow unpredictably |
Traditional solutions (REST calls, shared databases, cron jobs) break down because they:
- Create tight coupling
- Lose events during failures
- Can’t scale write/read throughput independently
Kafka exists to provide a durable, scalable, decoupled way to move data between systems.
Chapter 2. Design Philosophy
2.1 An Append-Only Log
Imagine the most primitive possible system: A file where you only append new records, never update or delete.
This simple structure gives powerful guarantees:
- Writing is fast (sequential disk I/O)
- Ordering is preserved
- History is retained
- Multiple readers can read independently
Kafka is fundamentally: A distributed, replicated, fault-tolerant commit log
2.2 From Log to Distributed System
A single log file is not enough. We need:
- Fault tolerance
- Parallel reads/writes
- Massive scale
So Kafka evolves the simple log into a distributed log system. Instead of one giant log, Kafka splits it into many smaller logs called partitions. But additionally also does replication among different machines.
Chapter 3. Core Building Blocks
3.1 Topics — Logical Data Streams
A topic is just a named stream of events. Can also be thought as channels.
Examples:
- user-signups
- payments
- click-events
Producers write to topics. Consumers read from topics.
3.2 Partitions — Parallelism + Scale
A topic is divided into partitions, each an independent append-only log.
Why?
| Need | Solution |
|---|---|
| Higher throughput | Multiple partitions allow parallel writes |
| Scalability | Spread partitions across machines |
| Ordering | Preserved within a partition |
Tradeoff: Kafka guarantees ordering per partition, not across the entire topic.
3.3 Brokers — The Kafka Servers
A broker is a Kafka server that stores partitions.
A Kafka cluster = many brokers.
Each partition lives on one broker as leader and is replicated to others as followers.
3.4 Replication — Surviving Failures
Kafka copies each partition to multiple brokers.
| Concept | Purpose |
|---|---|
| Leader replica | Handles all reads/writes |
| Follower replicas | Stay in sync, take over if leader dies |
If a broker crashes → another replica becomes leader → system continues.
Chapter 4. Producers — Writing to the Log
A producer sends messages to Kafka.
Key first-principle idea:
- Producers don’t talk to consumers directly. They just append to a shared log.
How partitioning works
-
A producer decides which partition a message goes to by:
-
Hashing a key (e.g., user_id)
- Round robin
- Custom logic
This enables Load balancing
Chapter 5. Consumers — Reading Independently
Consumers don’t receive pushes. They pull proactively from Kafka.
Each message has an offset — its position in the partition log.
Consumers track their own offsets: | Benefit | Explanation | |—|—| | Replay data | Reset offset to reprocess history | | Independent speeds | Slow consumer doesn’t block fast one | |Fault recovery | Resume from last committed offset |
Kafka stores data — it does not track who has read it (except offsets stored by consumer groups). Kafka introduces the concept of a consumer group. A consumer group is a set of consumers cooperating to read a topic.
Key rule: Each partition is assigned to exactly one consumer within a group. A consumer does not “choose” a partition manually. Kafka assigns partitions to consumers in a group automatically.
Chapter 6. Consumer Groups — Parallel Processing
If multiple instances of a service read a topic:
- Kafka assigns partitions across them
- Each partition is consumed by only one consumer in the group
This gives horizontal scalability: More partitions → more consumers → more throughput
Chapter 7. Durability and Delivery Guarantees
Kafka allows tuning reliability vs performance.
Acknowledgment levels | Setting | Meaning | |—|—| | acks=0 | Fire and forget | | acks=1 | Leader confirms write | | acks=all | All replicas confirm write |
Delivery semantics | Type | What It Means | |—|—| |At most once | No retries → possible data loss | | At least once | Retries → possible duplicates | |Exactly once | Idempotent producers + transactional writes |
Exactly-once requires coordination between producer, Kafka, and consumer logic.
Chapter 8. Retention — Kafka Is Not Just a Queue
Traditional queues delete messages after consumption. Kafka retains data for a configured time or size, regardless of consumption.
This enables:
- Reprocessing history
- Auditing
- Bootstrapping new services from old data
Kafka becomes a source of truth for event history, not just a transport layer.
Chapter 9. When Kafka Is the Right Tool
Kafka excels when you need:
- High-throughput event ingestion
- Decoupled microservices
- Event sourcing
- Real-time analytics pipelines
- Durable data streaming
Chapter 10. When Kafka Is NOT Ideal
Kafka is not a replacement for:
| Use Case | Better Tool |
|---|---|
| Low-latency RPC | REST/gRPC |
| Simple task queues | RabbitMQ/SQS |
| Strong relational queries | SQL database |
Kafka is optimized for streaming logs, not transactional queries.