← Back to Learn
Intermediate10 April 20265 min read

Message Queues — The Post Office of Your Backend

How does Zomato process 10,000 orders per minute without losing a single one? Message Queues. Let's understand them with a post office analogy.

message-queuesasynckafkascalability
Share:

What is a Message Queue?

Imagine a post office. You drop a letter in the mailbox and walk away. You don't wait at the post office until the letter is delivered. The post office handles delivery in the background, at its own pace.

A Message Queue works the same way. Instead of Service A directly calling Service B and waiting for a response, Service A drops a "message" in a queue and moves on. Service B picks it up whenever it's ready.

Without Queue (Synchronous):
User → Order Service → [WAIT] → Payment Service → [WAIT] → Email Service
Total time: 3 seconds 😴

With Queue (Asynchronous):
User → Order Service → ✅ Done! (200ms)
                ↓
         [Message Queue]
                ↓
    Payment Service picks up → processes
    Email Service picks up → sends email

Why Do We Need Message Queues?

1. Decoupling

Services don't need to know about each other. The order service doesn't care HOW the email is sent — it just says "send an email" and moves on.

Analogy: You don't need to know which postman delivers your letter. You just drop it in the mailbox.

2. Handling Traffic Spikes

During a flash sale, Flipkart might get 100,000 orders per minute. The payment system can only handle 10,000 per minute. Without a queue, 90,000 orders would fail.

With a queue, all 100,000 orders are accepted and queued. The payment system processes them at its own pace. No orders lost.

Analogy: A restaurant kitchen during rush hour. Orders pile up on the ticket rail, and the kitchen works through them one by one. No order is lost, even if the kitchen is slower than the waiters.

3. Reliability

If the email service crashes, the messages stay in the queue. When the service comes back up, it processes all pending messages. Nothing is lost.

Analogy: If the postman is sick today, your letters don't disappear. They wait at the post office until tomorrow.

4. Load Leveling

Instead of processing everything immediately (causing CPU spikes), the queue smooths out the workload over time.

Without Queue:
Traffic: ████████████░░░░░░░░ (spike then idle)
CPU:     ████████████░░░░░░░░ (matches traffic)

With Queue:
Traffic: ████████████░░░░░░░░ (spike then idle)  
Queue:   ████████████████████ (absorbs spike)
CPU:     ██████████████████── (smooth processing)

Real-World Example: Zomato Order Flow

When you place an order on Zomato:

1. 📱 You tap "Place Order"
2. ✅ Order Service saves order → returns "Order Placed!" instantly
3. 📨 Drops messages into queue:
   → "Process payment for Order #12345"
   → "Notify restaurant about Order #12345"  
   → "Send confirmation email to user"
   → "Update analytics dashboard"
4. Each service picks up its message independently

You see "Order Placed!" in 200ms. Behind the scenes, 4 different services are working asynchronously. If the email service is slow, your order still goes through.

Key Concepts

Producer & Consumer

Producer: The service that sends messages (Order Service)
Consumer: The service that reads and processes messages (Payment Service)
Queue/Topic: The mailbox where messages wait

At-Least-Once vs Exactly-Once Delivery

At-Least-Once: The message might be delivered more than once (if the consumer crashes before acknowledging). Your consumer must handle duplicates.

Analogy: The postman delivers your letter, but isn't sure you received it, so delivers it again just in case. You might get 2 copies.

Exactly-Once: Each message is processed exactly once. Harder to achieve but critical for payments.

Analogy: Registered post with acknowledgment. The postman knows for sure you got it.

Dead Letter Queue (DLQ)

Messages that fail processing multiple times go to a special "dead letter queue" for manual inspection.

Analogy: Undeliverable letters go to the "return to sender" pile at the post office.

Popular Message Queue Systems

Apache Kafka

• Built for high throughput (millions of messages/sec)
• Messages are persisted to disk (can replay)
• Used by: LinkedIn, Uber, Netflix
• Best for: Event streaming, real-time analytics

RabbitMQ

• Traditional message broker
• Supports complex routing patterns
• Used by: Reddit, 9GAG
• Best for: Task queues, RPC patterns

Amazon SQS

• Fully managed by AWS
• No infrastructure to manage
• Used by: Many startups
• Best for: Simple queue needs, AWS-native apps

Redis (as a Queue)

• Super fast (in-memory)
• Simple pub/sub
• Best for: Lightweight, real-time messaging

When NOT to Use a Queue

• When you need an immediate response (user login → must verify NOW)
• When the operation is simple and fast (no need for async overhead)
• When ordering matters strictly and you can't handle complexity

Interview Pattern

When designing any system in an interview, look for opportunities to say:

*"This operation doesn't need to happen synchronously. We can push it to a message queue and process it asynchronously. This improves response time for the user and makes the system more resilient."*

Common async candidates:

• Sending emails/notifications
• Processing payments (after initial validation)
• Generating reports
• Updating search indexes
• Analytics and logging

Key Takeaway

Message Queues = "I'll get to it, but not right now." They decouple services, handle traffic spikes, and make your system resilient. If one service goes down, messages wait patiently in the queue.

Every large-scale system uses message queues. Understanding them is essential for system design interviews and real-world engineering.

👨‍💻
Sahil Sudan

Software Engineer at Spense. I write about system design, web development, and fintech — explained simply for students and developers.

📬 Stay Updated

Get a new System Design or fintech insight every week. No spam, unsubscribe anytime.

Share: