Swiftorial Logo
Home
Swift Lessons
AI Tools
Learn More
Career
Resources

System Design: Scenario-Based Questions

62. How do you design a scalable notification system for millions of users?

Notification systems must deliver timely messages across channels (email, push, SMS) at scale while respecting rate limits, preferences, and retries. Reliability and observability are critical at this volume.

📬 Core Requirements

  • High throughput and low latency delivery.
  • Multi-channel support (email, SMS, push, webhooks).
  • Deduplication, retries, and rate-limiting.
  • User preferences and opt-outs.

🏗️ Architectural Components

  • Producer Queue: Services publish events (Kafka, SNS, RabbitMQ).
  • Orchestrator: Fan out to different channel handlers.
  • Worker Pools: Send notifications asynchronously (email, SMS, push).
  • Preference Engine: Filters by opt-in/opt-out settings, quiet hours, etc.
  • Rate Controller: Manages channel quotas (Twilio/Ses/etc.).

🧰 Tools & Infrastructure

  • Queueing: Kafka, SQS, Redis Streams for decoupling and backpressure.
  • Delivery APIs: Twilio, SES, Firebase, APNs.
  • Storage: DynamoDB/Postgres for templates, user settings, delivery status.
  • Retries: Exponential backoff with DLQ (dead-letter queues).

📊 Observability

  • Track delivery success, latency, error codes.
  • Dashboards per channel and alert on delivery drops.
  • Enable end-to-end tracing (from event to user device).

✅ Best Practices

  • Design templates and payloads to be dynamic and localized.
  • Use idempotent message IDs for retry safety.
  • Test fallbacks (e.g., push fails → send SMS).

🚫 Common Pitfalls

  • Sending duplicates during retries without idempotency.
  • Ignoring opt-out preferences or spamming users.
  • Failing to track bounce, unsubscribe, or blacklist events.

📌 Final Insight

Notification systems are high-volume, user-facing, and unforgiving. Design for scale, respect, and resilience — your users will notice both failure and finesse.