What is Apache Kafka and Why Does Everyone Use It?
If you've spent any time reading engineering blogs from Uber, Netflix, LinkedIn, or Shopify, you've probably noticed a common thread: they all use Apache Kafka. But what is it, and why has it become the backbone of modern data infrastructure?
Let's start with a simple analogy. Imagine a newspaper printing press:
- The reporters write stories (producers)
- The printing press stores and distributes the newspaper (the broker)
- The readers pick up copies at their own pace (consumers)
The key insight is that the reporters don't hand-deliver stories to each reader. They publish to the press, and readers consume independently. If a reader goes on vacation, they can catch up on old newspapers when they return. No stories are lost.
Apache Kafka works exactly the same way, but instead of newspaper stories, it handles billions of data events — user clicks, financial transactions, IoT sensor readings, application logs, and more.
A brief history
Kafka was originally built at LinkedIn in 2010 to solve a specific problem: LinkedIn had dozens of systems (search, analytics, monitoring) that all needed real-time access to the same data streams. Connecting every system directly to every other system created an unmanageable web of point-to-point integrations. Kafka provided a single, central "highway" for all data to flow through.
It was open-sourced in 2011, became an Apache top-level project in 2012, and today it is used by over 80% of Fortune 100 companies.
Kafka's Core Concepts: Topics, Partitions, Offsets, and Consumer Groups
Topics — Organizing Your Data Streams
A topic is simply a named category or feed. Think of it as a folder in your
email inbox. All "order" events go to the orders topic. All "user activity"
events go to the user-activity topic.
# Creating a Kafka topic via CLI
kafka-topics.sh --create \
--topic order-events \
--partitions 6 \
--replication-factor 3 \
--bootstrap-server kafka:9092
Partitions — The Secret to Kafka's Speed
This is where Kafka gets interesting. A single topic like order-events might
receive 100,000 messages per second. One server can't handle that alone. So Kafka splits
each topic into multiple partitions.
Think of partitions as lanes on a highway. Instead of forcing all traffic through one lane, Kafka distributes messages across multiple lanes (partitions), each running on a different server.
Topic: order-events
├── Partition 0: [msg-0, msg-3, msg-6, msg-9, ...] → Broker 1
├── Partition 1: [msg-1, msg-4, msg-7, msg-10, ...] → Broker 2
├── Partition 2: [msg-2, msg-5, msg-8, msg-11, ...] → Broker 3
Critical rule: Messages within a partition are strictly ordered (msg-0 always comes before msg-3). But there's no ordering guarantee across partitions. If you need strict ordering for a specific entity (like all events for Order #123), you use a message key — Kafka guarantees all messages with the same key land in the same partition.
// All events for the same order go to the same partition
producer.send({
topic: 'order-events',
messages: [{
key: 'ORD-98712', // Partition = hash(key) % numPartitions
value: JSON.stringify(event)
}]
});
Offsets — Your Bookmark in the Stream
Every message in a partition gets a unique, auto-incrementing number called an offset. It's like a page number in a book.
Partition 0:
Offset: 0 1 2 3 4 5 6 7
Data: [A] [B] [C] [D] [E] [F] [G] [H]
↑
Consumer is here (offset 3)
The consumer tracks its own offset. This is fundamentally different from traditional message queues (like RabbitMQ) where the broker tracks what each consumer has read. By shifting responsibility to the consumer, Kafka massively reduces broker overhead and unlocks a superpower: time travel. A consumer can reset its offset to 0 and replay the entire history of a topic. This is incredibly useful for rebuilding search indices, fixing bugs, or reprocessing data with updated logic.
Consumer Groups — Scaling Consumers Horizontally
What if one consumer can't keep up with the message rate? Kafka uses consumer groups. Within a group, each partition is assigned to exactly one consumer, allowing parallel processing:
Consumer Group: "inventory-service"
├── Consumer A → reads Partition 0, Partition 1
├── Consumer B → reads Partition 2, Partition 3
└── Consumer C → reads Partition 4, Partition 5
If Consumer B crashes, Kafka automatically rebalances and reassigns its partitions to A and C. No messages are lost. When Consumer B comes back, it resumes from its last committed offset.
Why is Kafka So Fast? (The Engineering Behind the Magic)
Here's the part that surprises most engineers: Kafka stores everything on disk, yet it routinely outperforms in-memory message queues. How?
Secret #1: Sequential I/O
Hard drives (and SSDs) are painfully slow at random reads/writes but blazingly fast at sequential reads/writes. Kafka exploits this by treating every partition as an append-only log file. New messages are always written to the end of the file — never in the middle. This turns disk I/O from Kafka's weakness into its greatest strength.
Random I/O: ~100 operations/sec (seek, read, seek, read...)
Sequential I/O: ~600 MB/sec (just keep reading forward)
That's a 6,000x difference.
Secret #2: OS Page Cache
When Kafka writes data to disk, the operating system doesn't actually write to the physical disk immediately. It stores the data in a memory buffer called the Page Cache. When a consumer reads the same data milliseconds later, it comes directly from RAM — not from disk. Kafka essentially gets in-memory speed while maintaining disk durability, without managing its own cache.
Secret #3: Zero-Copy Data Transfer
In a traditional application, sending a file over the network takes 4 steps:
- Read from disk → Kernel buffer
- Kernel buffer → Application memory (user space)
- Application memory → Socket buffer (back to kernel)
- Socket buffer → Network Interface Card (NIC)
That's 4 copies and 2 context switches between kernel and
user space. Kafka eliminates steps 2 and 3 using the Linux sendfile() system
call:
- Read from disk → Kernel buffer
- Kernel buffer → NIC (directly!)
This zero-copy optimization reduces CPU usage by up to 65% and is a major reason Kafka can saturate a 10 Gbps network link.
Secret #4: Batching and Compression
Kafka doesn't send messages one-by-one. Producers accumulate messages into batches (configurable by size or time), compress the entire batch using algorithms like LZ4, Snappy, or Zstandard, and send one compressed batch as a single network request. This dramatically reduces individual network calls and network bandwidth usage.
// Kafka producer configuration for optimal throughput
const producer = kafka.producer({
batch.size: 65536, // 64KB batches
linger.ms: 5, // Wait up to 5ms to fill a batch
compression.type: 'lz4', // LZ4 compression
acks: 1 // Wait for leader acknowledgment
});
Kafka vs. RabbitMQ: Which One Should You Pick?
This is the most common question engineers ask. The answer is simple: they solve different problems.
| Feature | Apache Kafka | RabbitMQ |
|---|---|---|
| Model | Distributed log (pub/sub) | Message queue (point-to-point) |
| Message retention | Configurable (days, weeks, forever) | Deleted after consumption |
| Replay capability | ✅ Yes (reset consumer offset) | ❌ No |
| Throughput | Millions of msgs/sec | Tens of thousands/sec |
| Ordering | Per-partition ordering | Per-queue ordering |
| Best for | Event streaming, data pipelines, log aggregation | Task queues, RPC, simple pub/sub |
Use Kafka when:
- You need to retain messages for replay (e.g., rebuilding a search index)
- Multiple independent consumers need to read the same stream
- You're building real-time analytics, event sourcing, or stream processing pipelines
- Throughput matters more than latency (Kafka optimizes for throughput)
Use RabbitMQ when:
- You need complex routing (fanout, topic-based, header-based)
- You want simple task distribution among workers (e.g., send emails, resize images)
- Messages should be deleted after successful processing
- Latency matters more than throughput (RabbitMQ delivers faster per-message)
Real-World Kafka Use Cases
1. Netflix — Billions of Events for Personalization
Netflix uses Kafka to capture every user interaction — what you watched, when you paused, what you searched — and feeds it into their recommendation engine in real-time. This data pipeline processes over 1 trillion events per day.
2. Uber — Real-Time Ride Matching
When you request an Uber, your location is published as an event to Kafka. Driver locations are continuously streamed as events. The matching service consumes both streams and pairs the closest driver to you — all within seconds.
3. Shopify — Black Friday at Scale
During Black Friday 2024, Shopify processed $9.3 billion in sales. Kafka handled the fire hose of order events, inventory updates, and payment confirmations — ensuring no order was lost even at peak traffic of millions of events per second.
4. LinkedIn — The Birthplace of Kafka
LinkedIn processes over 7 trillion messages per day through Kafka. Every profile view, connection request, and job application flows through the Kafka pipeline before reaching the end user.
Getting Started with Kafka: Your First Steps
Kafka has a steep learning curve, but you don't need to boil the ocean. Here's a pragmatic roadmap:
- Run Kafka locally using Docker Compose. The Confluent Platform Docker setup gets you running in 5 minutes.
- Build a simple producer/consumer using KafkaJS (Node.js), confluent-kafka-python (Python), or Spring Kafka (Java).
- Understand partitioning. Experiment with message keys and observe how Kafka distributes data across partitions.
- Try Kafka Connect to stream data from your existing PostgreSQL/MySQL database into Kafka topics automatically (Change Data Capture).
- Explore Kafka Streams or ksqlDB for real-time stream processing without needing external tools like Apache Flink.
Kafka is not just a message queue — it's a distributed commit log that fundamentally changes how organizations think about data flow. Once you understand its primitives, you'll see opportunities to use it everywhere.
Published:
February 24, 2026
Updated:
February 24, 2026