← Back to Learn
Intermediate14 April 20263 min read

Database Sharding — Splitting Your Database Like a Pizza

When your database has 100 million rows and queries are slow, you need sharding. Let's break it down with a pizza analogy that actually makes sense.

databaseshardingscalability

What is Database Sharding?

Imagine you have a massive phone book with 10 crore (100 million) entries. Finding someone's number takes forever because you're searching through one giant book.

Now imagine splitting that phone book into 26 smaller books — one for each letter of the alphabet. Looking for "Sahil"? Go straight to the "S" book. Much faster.

That's sharding — splitting a large database into smaller, faster, more manageable pieces called shards.

Why Do We Need Sharding?

As your app grows, a single database becomes a bottleneck:

  • Slow queries: Scanning millions of rows takes time
  • Storage limits: One machine can only hold so much data
  • Write bottleneck: All writes go to one server
  • Sharding solves all three by distributing data across multiple database servers.

    Real-World Example: PhonePe Transactions

    PhonePe processes crores of transactions daily. They can't store all transactions in one database — it would be impossibly slow.

    Instead, they might shard by user ID:

  • • Users 1-10M → Database Shard 1
  • • Users 10M-20M → Database Shard 2
  • • Users 20M-30M → Database Shard 3
  • When you check your transaction history, the system knows exactly which shard to query based on your user ID.

    Sharding Strategies

    1. Range-Based Sharding

    Split data by ranges of a key.

    Example: Users with IDs 1-1M go to Shard 1, 1M-2M to Shard 2, etc.

    Pros: Simple to implement

    Cons: Uneven distribution (what if most active users are in Shard 1?)

    2. Hash-Based Sharding

    Apply a hash function to the key and use the result to determine the shard.

    
    

    shard_number = hash(user_id) % number_of_shards

    
    
    

    Example: hash(12345) % 4 = 1 → goes to Shard 1

    Pros: Even distribution

    Cons: Hard to add new shards (re-hashing needed)

    3. Geographic Sharding

    Split data by location.

    Example: Indian users → Mumbai database, US users → Virginia database.

    Pros: Low latency for users, data residency compliance

    Cons: Cross-region queries are complex

    The Challenges

    1. Cross-Shard Queries

    If you need data from multiple shards (e.g., "show all transactions above ₹10,000"), you have to query ALL shards and merge results. This is slow and complex.

    2. Rebalancing

    When one shard gets too big, you need to split it. Moving data between shards without downtime is hard.

    3. Joins Across Shards

    SQL JOINs across different database servers? Very painful. This is why sharded systems often denormalize data.

    4. Consistent Hashing

    When you add/remove shards, hash-based sharding requires rehashing everything. Consistent hashing minimizes this — only a fraction of data needs to move.

    Sharding vs. Replication

    Don't confuse them:

  • Sharding: Different data on different servers (horizontal partitioning)
  • Replication: Same data copied to multiple servers (for redundancy)
  • Most production systems use both — shard for scale, replicate each shard for reliability.

    When to Shard

    Don't shard prematurely. Try these first:

  • Optimize queries and add indexes
  • Vertical scaling (bigger machine)
  • Read replicas
  • Caching (Redis)
  • If you've done all that and still hitting limits → time to shard.

    Key Takeaway

    Sharding is about divide and conquer. Split your data smartly, and each piece becomes manageable. But it adds complexity — so only do it when you truly need it.

    In interviews, always mention the trade-offs. Interviewers love candidates who understand that sharding isn't free — it comes with cross-shard query complexity, rebalancing challenges, and operational overhead.

    👨‍💻
    Sahil Sudan

    Software Engineer at Spense. I write about system design, web development, and fintech — explained simply for students and developers.