Introduction to Kafka

How do we build a system where data can be written once, stored durably, and consumed many times, independently, and at massive scale?

Kafka is a distributed event streaming platform.

Chapter 1. The Fundamental Problem Kafka Solves

Modern systems are not a single program — they are many independent services that must continuously exchange data.

At first principles level, distributed systems struggle with three core challenges:

Problem	Why It’s Hard
Data sharing	Services need the same data at different times
Decoupling	Producers and consumers shouldn’t depend on each other’s availability
Scale	Data volume and traffic grow unpredictably

Traditional solutions (REST calls, shared databases, cron jobs) break down because they:

Create tight coupling
Lose events during failures
Can’t scale write/read throughput independently

Kafka exists to provide a durable, scalable, decoupled way to move data between systems.

Chapter 2. Design Philosophy

2.1 An Append-Only Log

Imagine the most primitive possible system: A file where you only append new records, never update or delete.

This simple structure gives powerful guarantees:

Writing is fast (sequential disk I/O)
Ordering is preserved
History is retained
Multiple readers can read independently

Kafka is fundamentally: A distributed, replicated, fault-tolerant commit log

2.2 From Log to Distributed System

A single log file is not enough. We need:

Fault tolerance
Parallel reads/writes
Massive scale

So Kafka evolves the simple log into a distributed log system. Instead of one giant log, Kafka splits it into many smaller logs called partitions. But additionally also does replication among different machines.

Chapter 3. Core Building Blocks

3.1 Topics — Logical Data Streams

A topic is just a named stream of events. Can also be thought as channels.

Examples:

user-signups
payments
click-events

Producers write to topics. Consumers read from topics.

3.2 Partitions — Parallelism + Scale

A topic is divided into partitions, each an independent append-only log.

Why?

Need	Solution
Higher throughput	Multiple partitions allow parallel writes
Scalability	Spread partitions across machines
Ordering	Preserved within a partition

Tradeoff: Kafka guarantees ordering per partition, not across the entire topic.

3.3 Brokers — The Kafka Servers

A broker is a Kafka server that stores partitions.

A Kafka cluster = many brokers.

Each partition lives on one broker as leader and is replicated to others as followers.

3.4 Replication — Surviving Failures

Kafka copies each partition to multiple brokers.

Concept	Purpose
Leader replica	Handles all reads/writes
Follower replicas	Stay in sync, take over if leader dies

If a broker crashes → another replica becomes leader → system continues.

Chapter 4. Producers — Writing to the Log

A producer sends messages to Kafka.

Key first-principle idea:

Producers don’t talk to consumers directly. They just append to a shared log.

How partitioning works

A producer decides which partition a message goes to by:
Hashing a key (e.g., user_id)
- Round robin
- Custom logic

This enables Load balancing

Chapter 5. Consumers — Reading Independently

Consumers don’t receive pushes. They pull proactively from Kafka.

Each message has an offset — its position in the partition log.

Kafka stores data — it does not track who has read it (except offsets stored by consumer groups). Kafka introduces the concept of a consumer group. A consumer group is a set of consumers cooperating to read a topic.

Key rule: Each partition is assigned to exactly one consumer within a group. A consumer does not “choose” a partition manually. Kafka assigns partitions to consumers in a group automatically.

Chapter 6. Consumer Groups — Parallel Processing

If multiple instances of a service read a topic:

Kafka assigns partitions across them
Each partition is consumed by only one consumer in the group

This gives horizontal scalability: More partitions → more consumers → more throughput

Chapter 7. Durability and Delivery Guarantees

Kafka allows tuning reliability vs performance.

Exactly-once requires coordination between producer, Kafka, and consumer logic.

Chapter 8. Retention — Kafka Is Not Just a Queue

Traditional queues delete messages after consumption. Kafka retains data for a configured time or size, regardless of consumption.

This enables:

Reprocessing history
Auditing
Bootstrapping new services from old data

Kafka becomes a source of truth for event history, not just a transport layer.

Chapter 9. When Kafka Is the Right Tool

Kafka excels when you need:

High-throughput event ingestion
Decoupled microservices
Event sourcing
Real-time analytics pipelines
Durable data streaming

Chapter 10. When Kafka Is NOT Ideal

Kafka is not a replacement for:

Use Case	Better Tool
Low-latency RPC	REST/gRPC
Simple task queues	RabbitMQ/SQS
Strong relational queries	SQL database

Kafka is optimized for streaming logs, not transactional queries.

Written on February 8, 2026