1️⃣ Scalable File Processing System (Azure-Focused)
🎯 Requirements
- Users upload files (possibly large)
- Files must be processed asynchronously
- System must scale automatically
- Handle retries and failures safely
- No duplicate processing
- Observability required
- External API call may be involved
📌 Assumptions
- High upload variability (spikes possible)
- At-least-once processing acceptable
- Strong reliability required
- Processing may take minutes
🏗️ Architecture Overview
Client → Blob Storage (SAS upload)
Blob Created Event → Event Grid
Event Grid → Service Bus Queue
Service Bus → Azure Function (Processor)
Processor → Redis / SQL / External API
🔹 Key Components
Azure Blob Storage
- Stores raw files
- SAS tokens for secure direct upload
- Avoid API bottleneck
Event Grid
- Reacts to blob creation
- Lightweight routing
Service Bus
- Durable buffering
- Queue-based load leveling
- DLQ support
- Auto lock renewal for long tasks
Azure Functions
- Competing consumers
- Scales based on queue length
- Must be idempotent
🔁 Idempotency Strategy
- Maintain ProcessingStatus table
- Status: Pending → InProgress → Completed
- Unique constraint on FileId
- Conditional update ensures single owner
⚠️ Failure & Edge Cases
Lock Expiry
Mitigation:
- Auto lock renewal
- Or redesign to complete message early
Function Crash Mid-Processing
- Lock expires
- Message reprocessed
- Idempotency prevents duplication
External API Non-Idempotent
- Cannot guarantee exactly-once
- Use Saga
- Store external call state
- Reconciliation/manual review
Redis Down
- Circuit breaker
- Fallback to DB
- Rate limit to protect DB
📈 Scalability
- Horizontal scaling of functions
- Service Bus absorbs spikes
- Blob auto-scales
- Partition DB if needed
2️⃣ URL Shortener
🎯 Requirements
- Create short URL
- Redirect quickly (<50ms)
- High read volume
- Moderate write volume
- Expiration (90 days)
- Analytics optional
🏗️ Architecture
Client → Front Door → APIM → App Service
App → Redis Cache
Redis → SQL DB
🔢 Short Code Generation
- Auto-increment ID
- Base62 encoding (URL safe)
- Avoid Base64 (+ / = unsafe)
📦 Data Storage
SQL Schema:
- id (PK)
- shortCode (indexed)
- longUrl
- createdAt
- expiresAt
⚡ Caching Strategy
- Cache-aside
- Long TTL for static mapping
- Edge caching (Front Door) for hot URLs
🔥 Hot Key Scenario
Single URL → 50k RPS
Mitigation:
- CDN caching
- Reduce Redis & DB pressure
🗑 Expiration Handling
- Use expiresAt column
- Indexed
- Incremental batch deletion
- Avoid monthly large delete
📈 Scaling
- Stateless app servers
- Autoscale rules (CPU + RPS)
- Read replicas for SQL
3️⃣ Notification System
🎯 Requirements
- Send Email, SMS, Push
- Triggered by events
- Retry on failure
- No duplicate spam
- Independent channel scaling
🏗️ Architecture
Producer → Service Bus Topic
Subscriptions:
- Email Processor
- SMS Processor
- Push Processor
Each subscription isolated.
🔁 Delivery State
Notification table:
- NotificationId
- Channel
- Status
- RetryCount
- LastAttempt
⚠️ SMS Provider Down Scenario
- Complete SB message early
- Persist delivery intent
- Circuit breaker on provider
- Exponential backoff
- Scheduled retry
Avoid retry storm.
🔄 "Both or None" Channel Requirement
- True atomicity impossible
- Use Saga pattern
- Compensating message
- Manual reconciliation if needed
📈 Scalability
- Competing consumers per channel
- Independent scaling
- DLQ per subscription
4️⃣ Scalable API Service
🎯 Requirements
- 5k–50k RPS
- Global users
- Secure
- Low latency
- High availability
🏗️ Architecture
Client → Front Door (WAF)
→ APIM
→ App Service
→ Redis
→ SQL / Cosmos DB
🔐 Security
- Azure Entra ID
- Managed Identity
- No secrets in code
📊 Rate Limiting
- Static rate-limit policy
- Dynamic throttle via Named Value + Azure Monitor automation
- Return 429 when overloaded
📦 Database Choice
SQL
- Strong relational model
- ACID transactions
- Read replicas
Cosmos DB
- Global distribution
- Multi-region writes
- Configurable consistency
- Partition key critical
🌍 Multi-Region
- Front Door for routing
- Geo-replication for DB
- Cache per region
⚠️ Traffic Spike Handling
- Apply rate limiting
- Increase cache TTL
- Scale app tier
- Scale DB
- Long-term architectural improvements
🧠 Final Takeaways
- Idempotency is mandatory
- Lock renewal reduces but does not eliminate duplication
- Push load outward
- Separate ingestion from processing
- Query pattern drives data modeling
- Exactly-once requires cooperation
- Design for failure first