What is Rate Limiting?
Imagine a popular nightclub. If 5,000 people rush in at once, it's chaos — people get crushed, the bar runs out of drinks, and the DJ can't even hear the music.
So what does the club do? They hire a bouncer who says: *"Only 100 people per hour. Once you're in, enjoy. But if you try to sneak in 50 times, you're banned."*
That's Rate Limiting — controlling how many requests a user or client can make to your API within a given time window.
Why Do We Need It?
🛡️ Protection from Abuse
Without rate limiting, a single user (or bot) could:
⚖️ Fair Usage
If one user makes 1 million requests, other users get slow responses. Rate limiting ensures everyone gets a fair share.
💰 Cost Control
Every API call costs compute, bandwidth, and database queries. Unlimited calls = unlimited bills.
Real-World Example: UPI Transactions
When you use Google Pay or PhonePe, there's a limit on how many transactions you can do per day. This isn't just a business rule — it's rate limiting at the system level.
📱 User tries 6th UPI transaction today
↓
🚫 Rate Limiter: "Daily limit reached (5/day)"
↓
❌ Transaction blocked with friendly message
If there were no limits, a compromised account could drain money through thousands of micro-transactions before anyone notices.
Common Rate Limiting Algorithms
1. Fixed Window Counter
The simplest approach. Count requests in fixed time windows.
Window: 1 minute
Limit: 100 requests
12:00:00 - 12:00:59 → Count requests (max 100)
12:01:00 - 12:01:59 → Reset counter, start fresh
Analogy: A parking lot that allows 100 cars per hour. At the start of each hour, the counter resets.
Problem: A user could make 100 requests at 12:00:59 and 100 more at 12:01:00 — that's 200 requests in 2 seconds!
2. Sliding Window Log
Instead of fixed windows, track the timestamp of every request and count how many fall within the last N seconds.
Analogy: Instead of resetting the parking counter every hour, you check: "How many cars entered in the LAST 60 minutes from right now?"
Pros: No burst problem
Cons: Memory-heavy (storing every timestamp)
3. Token Bucket (Most Popular)
Imagine a bucket that holds tokens. Every request costs one token. Tokens are added at a fixed rate.
Bucket capacity: 10 tokens
Refill rate: 1 token per second
→ User makes 5 requests → 5 tokens used, 5 remaining ✅
→ User makes 6 more → Only 5 tokens left → 1 request rejected ❌
→ Wait 5 seconds → 5 new tokens added → Can make requests again ✅
Analogy: A prepaid mobile plan. You get 1GB data per day. Use it up, and you're throttled until tomorrow when it refills.
Why it's popular: Allows short bursts (good for real usage) while enforcing long-term limits.
4. Leaky Bucket
Requests enter a queue (bucket) and are processed at a fixed rate, like water leaking from a bucket.
Incoming: 💧💧💧💧💧💧💧 (7 requests at once)
Processing: 💧...💧...💧... (1 per second)
Overflow: 💧💧 dropped (bucket full)
Analogy: A funnel. Pour water fast, it still drips out slowly. Pour too much, it overflows.
Where to Implement Rate Limiting?
Client → API Gateway (Rate Limiter) → Your Servers
↓
"429 Too Many Requests"
Common places:
HTTP Response When Rate Limited
HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714060800
Good APIs tell you:
Key Takeaway
Rate Limiting = "You can come in, but not too fast." It protects your system from abuse, ensures fair usage, and keeps costs under control.
In interviews, mention the Token Bucket algorithm — it's the most commonly used in production (used by AWS, Stripe, and most major APIs). Bonus points if you discuss distributed rate limiting using Redis.