← Back to Learn
Advanced11 April 20263 min read

Consistent Hashing — The Musical Chairs of Distributed Systems

How do systems like DynamoDB and Cassandra distribute data across servers without reshuffling everything when a server is added? Consistent Hashing.

consistent-hashingdistributed-systems
Share:

The Problem

Imagine you have 4 servers and use simple hashing to distribute data:

server = hash(key) % 4

"user_1" → hash → 7 % 4 = 3 → Server 3
"user_2" → hash → 12 % 4 = 0 → Server 0
"user_3" → hash → 5 % 4 = 1 → Server 1

Works great. But what happens when you add a 5th server?

server = hash(key) % 5  ← changed from 4 to 5

"user_1" → hash → 7 % 5 = 2 → Server 2 (was Server 3!)
"user_2" → hash → 12 % 5 = 2 → Server 2 (was Server 0!)
"user_3" → hash → 5 % 5 = 0 → Server 0 (was Server 1!)

Almost EVERY key maps to a different server. You'd need to move nearly all your data. With billions of keys, this is catastrophic.

The Musical Chairs Analogy

Think of musical chairs arranged in a circle. Each chair is a server. Players (data) sit in the nearest chair clockwise from where they're standing.

If you remove one chair, only the player sitting in that chair needs to find a new seat (the next chair clockwise). Everyone else stays put.

If you add a chair, only a few players near that spot need to move. Most players stay in their current chairs.

That's Consistent Hashing — when servers change, only a small fraction of data needs to move.

How It Works

Step 1: The Hash Ring

Imagine a circle (ring) with positions 0 to 360 (like a clock).

        0°
        │
   330° ─┼─ 30°
        │
  300° ──┼── 60°
        │
   270° ─┼─ 90°
        │
  240° ──┼── 120°
        │
   210° ─┼─ 150°
        │
       180°

Step 2: Place Servers on the Ring

Hash each server's name to get its position:

hash("Server A") → 45°
hash("Server B") → 135°
hash("Server C") → 250°

Step 3: Place Data on the Ring

Hash each key and walk clockwise to find the nearest server:

hash("user_1") → 80° → walks clockwise → hits Server B (135°)
hash("user_2") → 200° → walks clockwise → hits Server C (250°)
hash("user_3") → 300° → walks clockwise → hits Server A (45°)

Step 4: Adding a Server

Add Server D at position 180°. Only keys between 135° and 180° need to move from Server C to Server D. Everything else stays.

Step 5: Removing a Server

Remove Server B. Only keys that were on Server B move to the next server clockwise (Server C). Everything else stays.

Virtual Nodes

One problem: with few servers, data distribution can be uneven. One server might get 60% of the data.

Solution: Each physical server gets multiple positions on the ring (virtual nodes).

Server A → positions 45°, 120°, 280° (3 virtual nodes)
Server B → positions 90°, 200°, 330° (3 virtual nodes)

More virtual nodes = more even distribution. In practice, systems use 100-200 virtual nodes per server.

Who Uses Consistent Hashing?

Amazon DynamoDB — distributes data across storage nodes
Apache Cassandra — partitions data across the cluster
Akamai CDN — routes requests to edge servers
Discord — distributes chat messages across servers
Memcached — distributes cached data across nodes

Key Takeaway

Consistent Hashing solves the "reshuffling problem" in distributed systems. When servers are added or removed, only K/N keys need to move (K = total keys, N = total servers), instead of almost all keys. It's a must-know for any system design interview involving distributed data.

👨‍💻
Sahil Sudan

Software Engineer at Spense. I write about system design, web development, and fintech — explained simply for students and developers.

📬 Stay Updated

Get a new System Design or fintech insight every week. No spam, unsubscribe anytime.

Share: