← Back to Learn
Beginner19 April 20264 min read

Rate Limiting — The Bouncer at Your API's Door

Why does Instagram stop you from liking 500 posts in a minute? Rate Limiting. Let's understand how it protects systems from abuse — with a nightclub bouncer analogy.

rate-limitingsecurityapi-designbasics
Share:

What is Rate Limiting?

Imagine a popular nightclub. If 5,000 people rush in at once, it's chaos — people get crushed, the bar runs out of drinks, and the DJ can't even hear the music.

So what does the club do? They hire a bouncer who says: *"Only 100 people per hour. Once you're in, enjoy. But if you try to sneak in 50 times, you're banned."*

That's Rate Limiting — controlling how many requests a user or client can make to your API within a given time window.

Why Do We Need It?

🛡️ Protection from Abuse

Without rate limiting, a single user (or bot) could:

• Send 10,000 login attempts per second (brute force attack)
• Scrape your entire database through the API
• Crash your servers by flooding them with requests

⚖️ Fair Usage

If one user makes 1 million requests, other users get slow responses. Rate limiting ensures everyone gets a fair share.

💰 Cost Control

Every API call costs compute, bandwidth, and database queries. Unlimited calls = unlimited bills.

Real-World Example: UPI Transactions

When you use Google Pay or PhonePe, there's a limit on how many transactions you can do per day. This isn't just a business rule — it's rate limiting at the system level.

📱 User tries 6th UPI transaction today
   ↓
🚫 Rate Limiter: "Daily limit reached (5/day)"
   ↓
❌ Transaction blocked with friendly message

If there were no limits, a compromised account could drain money through thousands of micro-transactions before anyone notices.

Common Rate Limiting Algorithms

1. Fixed Window Counter

The simplest approach. Count requests in fixed time windows.

Window: 1 minute
Limit: 100 requests

12:00:00 - 12:00:59 → Count requests (max 100)
12:01:00 - 12:01:59 → Reset counter, start fresh

Analogy: A parking lot that allows 100 cars per hour. At the start of each hour, the counter resets.

Problem: A user could make 100 requests at 12:00:59 and 100 more at 12:01:00 — that's 200 requests in 2 seconds!

2. Sliding Window Log

Instead of fixed windows, track the timestamp of every request and count how many fall within the last N seconds.

Analogy: Instead of resetting the parking counter every hour, you check: "How many cars entered in the LAST 60 minutes from right now?"

Pros: No burst problem

Cons: Memory-heavy (storing every timestamp)

3. Token Bucket (Most Popular)

Imagine a bucket that holds tokens. Every request costs one token. Tokens are added at a fixed rate.

Bucket capacity: 10 tokens
Refill rate: 1 token per second

→ User makes 5 requests → 5 tokens used, 5 remaining ✅
→ User makes 6 more → Only 5 tokens left → 1 request rejected ❌
→ Wait 5 seconds → 5 new tokens added → Can make requests again ✅

Analogy: A prepaid mobile plan. You get 1GB data per day. Use it up, and you're throttled until tomorrow when it refills.

Why it's popular: Allows short bursts (good for real usage) while enforcing long-term limits.

4. Leaky Bucket

Requests enter a queue (bucket) and are processed at a fixed rate, like water leaking from a bucket.

Incoming: 💧💧💧💧💧💧💧 (7 requests at once)
Processing: 💧...💧...💧... (1 per second)
Overflow: 💧💧 dropped (bucket full)

Analogy: A funnel. Pour water fast, it still drips out slowly. Pour too much, it overflows.

Where to Implement Rate Limiting?

Client → API Gateway (Rate Limiter) → Your Servers
              ↓
        "429 Too Many Requests"

Common places:

API Gateway (AWS API Gateway, Kong, Nginx) — most common
Application level (middleware in Express/Spring Boot)
CDN level (Cloudflare, Akamai)

HTTP Response When Rate Limited

HTTP/1.1 429 Too Many Requests
Retry-After: 30
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1714060800

Good APIs tell you:

• How many requests you're allowed
• How many you have left
• When the limit resets

Key Takeaway

Rate Limiting = "You can come in, but not too fast." It protects your system from abuse, ensures fair usage, and keeps costs under control.

In interviews, mention the Token Bucket algorithm — it's the most commonly used in production (used by AWS, Stripe, and most major APIs). Bonus points if you discuss distributed rate limiting using Redis.

👨‍💻
Sahil Sudan

Software Engineer at Spense. I write about system design, web development, and fintech — explained simply for students and developers.

📬 Stay Updated

Get a new System Design or fintech insight every week. No spam, unsubscribe anytime.

Share: