Building systems that handle millions of concurrent transactions requires more than just good code. It requires understanding how to partition data, cache intelligently, and communicate asynchronously. This guide covers 20 essential system design patterns, each explained simply with Python examples.
The 20 Patterns You Need
- Database Sharding
- Caching Strategies
- Event-Driven Architecture
- Rate Limiting
- Load Balancing
- Circuit Breakers
- Connection Pooling
- Idempotency
- Async Processing
- Monitoring & Observability
- Horizontal vs Vertical Scaling
- Microservices Architecture
- Saga Pattern (Distributed Transactions)
- Eventual Consistency
- CAP Theorem
- Data Partitioning Strategies
- WebSockets for Real-Time
- Server-Sent Events (SSE)
- Distributed Transactions (2PC)
- Throttling
1. Database Sharding
What it is: Splitting your data across multiple database servers so no single server becomes a bottleneck.
The analogy: Imagine a library with 10 million books. Instead of one librarian searching through all books, hire 4 librarians. The first handles A-C, second D-F, third G-I, fourth J-Z. Each searches only their section.
When to use: Your database exceeds 10-50M records and queries are slowing down, or you need to distribute writes across servers.
Python example:
# Simple sharding by user_id
def get_shard_id(user_id, num_shards=4):
"""Determine which shard this user belongs to"""
return user_id % num_shards
# In production, connect to different DB based on shard_id
def get_user_connection(user_id, shard_servers):
shard_id = get_shard_id(user_id)
return shard_servers[shard_id] # e.g., 'db-shard-1.example.com'
# Example: Instagram uses range-based sharding
# Shard 1: Users 0-1M, Shard 2: Users 1M-2M, etc.
shards = {
0: 'db1.instagram.com', # 500M users
1: 'db2.instagram.com', # 500M users
2: 'db3.instagram.com',
3: 'db4.instagram.com'
}
user_id = 12345
connection = get_user_connection(user_id, shards)
Common mistakes: Choosing a poor shard key (like timestamp, which creates hot shards), not planning for shard rebalancing, or querying across all shards unnecessarily.
2. Caching Strategies
What it is: Storing frequently accessed data in fast memory (Redis, Memcached) to avoid repeated database queries.
The analogy: Instead of going to the library every time you want to read a bookshelf menu, print popular menus and tape them to your front door.
When to use: Queries take 100ms+ or same data is read 10+ times per minute.
Python example:
import redis
import json
cache = redis.Redis(host='localhost', port=6379)
def get_user_profile(user_id, ttl_seconds=3600):
"""Get user profile with Redis caching"""
cache_key = f"user:{user_id}"
# Try cache first (cache hit = 1ms)
cached = cache.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss = query database (100ms)
user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
# Store in cache with 1-hour TTL
cache.setex(cache_key, ttl_seconds, json.dumps(user))
return user
# Write-through caching: write to database AND cache
def update_user(user_id, data):
database.update('users', data)
cache.setex(f"user:{user_id}", 3600, json.dumps(data))
Common mistakes: Forgetting to invalidate cache after updates, setting TTL too high (stale data), or caching data that changes every minute.
3. Event-Driven Architecture
What it is: Instead of synchronous function calls, publish events to a message queue. Subscribers listen and react.
The analogy: Restaurant kitchen. Waiter doesn't hand ticket to cook and wait. Waiter puts ticket on a spike (message queue). Multiple cooks grab tickets as they're free. Some cooks prepare food, others plate it, others calculate bill.
When to use: You have 100+ operations triggered by a single action, or you need to decouple services.
Python example:
import json
from kafka import KafkaProducer, KafkaConsumer
# Event producer: user signs up
producer = KafkaProducer(bootstrap_servers=['localhost:9092'])
def create_user(email, name):
user = database.insert('users', {'email': email, 'name': name})
# Publish event instead of calling functions directly
event = {'type': 'user.created', 'user_id': user['id'], 'email': email}
producer.send('events', json.dumps(event).encode())
# Event consumers: independent services
consumer = KafkaConsumer('events', bootstrap_servers=['localhost:9092'])
for message in consumer:
event = json.loads(message.value)
if event['type'] == 'user.created':
send_welcome_email(event['email']) # Email service
create_free_trial(event['user_id']) # Billing service
log_analytics(event) # Analytics service
# All run independently, in parallel!
Common mistakes: Event ordering matters but not guaranteed in Kafka, events can be processed twice (fix with idempotency), messages can be lost if not configured right.
4. Rate Limiting
What it is: Restrict how many requests a client can make per time period.
The analogy: Vending machine that only dispenses one drink per 10 seconds, preventing people from draining it.
When to use: Protecting APIs from abuse, preventing DDoS, or ensuring fair resource usage. Typical: 100 req/min for users, 1000 req/min for premium.
Python example:
import redis
import time
from functools import wraps
redis_client = redis.Redis()
def rate_limit(requests_per_minute=60):
"""Token bucket algorithm for rate limiting"""
def decorator(func):
@wraps(func)
def wrapper(user_id):
key = f"rate_limit:{user_id}"
current = redis_client.incr(key)
# Set 1-minute expiry on first request
if current == 1:
redis_client.expire(key, 60)
# Reject if over limit
if current > requests_per_minute:
return {"error": "Rate limit exceeded", "status": 429}
return func(user_id)
return wrapper
return decorator
@rate_limit(requests_per_minute=100)
def api_get_user(user_id):
return database.get_user(user_id)
Common mistakes: Resetting limit incorrectly, not returning HTTP 429, or not distinguishing between user tiers.
5. Load Balancing
What it is: Distribute incoming requests across multiple servers so no server gets overloaded.
The analogy: Pizza delivery dispatcher assigns orders to drivers based on current workload. Driver with 2 pending deliveries gets the next call, not the driver with 6.
When to use: You have multiple web servers and 1000+ requests/second.
Python example:
# Round-robin load balancer (basic)
class RoundRobinBalancer:
def __init__(self, servers):
self.servers = servers
self.current_index = 0
def get_next_server(self):
"""Cycle through servers sequentially"""
server = self.servers[self.current_index]
self.current_index = (self.current_index + 1) % len(self.servers)
return server
# Least connections balancer
class LeastConnectionsBalancer:
def __init__(self, servers):
self.server_load = {server: 0 for server in servers}
def get_next_server(self):
"""Pick server with fewest active connections"""
return min(self.server_load, key=self.server_load.get)
def request_sent(self, server):
self.server_load[server] += 1
def request_completed(self, server):
self.server_load[server] -= 1
balancer = LeastConnectionsBalancer(['web1.example.com', 'web2.example.com'])
server1 = balancer.get_next_server() # Pick best server
balancer.request_sent(server1)
Common mistakes: Sticky sessions not implemented (user bounces between servers), health checks not running, or not handling server failures.
6. Circuit Breaker Pattern
What it is: When a service fails repeatedly, stop calling it temporarily. It's like a circuit breaker in your home—it cuts electricity to prevent fire.
The analogy: Calling your pizza place. If it goes to voicemail 3 times, don't call for the next 30 seconds. Try again later.
When to use: Calling external APIs or databases that might fail. Prevents cascading failures.
Python example:
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Working normally
OPEN = "open" # Failing, reject requests
HALF_OPEN = "half_open" # Testing if recovered
class CircuitBreaker:
def __init__(self, failure_threshold=3, timeout=30):
self.state = CircuitState.CLOSED
self.failure_count = 0
self.failure_threshold = failure_threshold
self.timeout = timeout
self.last_failure_time = None
def call(self, func, *args):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args)
self.on_success()
return result
except Exception as e:
self.on_failure()
raise
def on_success(self):
self.failure_count = 0
self.state = CircuitState.CLOSED
def on_failure(self):
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = CircuitState.OPEN
# Usage
breaker = CircuitBreaker(failure_threshold=3, timeout=30)
try:
result = breaker.call(external_api_call, user_id=123)
except Exception as e:
print("Service temporarily unavailable")
Common mistakes: Not monitoring HALF_OPEN state, timeout too short (keeps retrying too fast), or forgetting to reset failure count.
7. Connection Pooling
What it is: Reuse database connections instead of creating new ones for every query.
The analogy: Taxi company keeps 10 taxis idle. When you call for a taxi, they give you an existing one (fast). Better than finding a taxi driver, hiring them, and training them every time.
When to use: Database connections are slow (50-100ms to establish), and you make 100+ queries/second.
Python example:
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
# Create connection pool: min 5 connections, max 20
engine = create_engine(
'postgresql://user:password@localhost/mydb',
poolclass=QueuePool,
pool_size=5, # Keep 5 idle connections
max_overflow=15, # Allow 15 more if needed
pool_pre_ping=True # Test connection health before use
)
def query_user(user_id):
# Get connection from pool (fast if available)
with engine.connect() as conn:
result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}")
return result.fetchone()
# Connection returns to pool for reuse
# Without pooling: each query = 50ms overhead
# With pooling: reuse = 1ms overhead = 50x faster!
Common mistakes: Pool size too small (connections exhaust), not handling exceeding max connections, or not configuring timeout.
8. Idempotency
What it is: An operation produces the same result whether called once or 100 times.
The analogy: Pressing an elevator button. Pressing once = elevator comes. Pressing 10 times = same result, elevator doesn't come 10 times.
When to use: Payment processing, order placement, or any critical operation that might be retried.
Python example:
import uuid
import redis
import json
cache = redis.Redis()
def process_payment(user_id, amount, idempotency_key):
"""Process payment with idempotency protection"""
cache_key = f"payment:{idempotency_key}"
# Check if we've already processed this
existing_result = cache.get(cache_key)
if existing_result:
return json.loads(existing_result) # Return cached result
# Process payment (charge credit card)
result = {
"transaction_id": str(uuid.uuid4()),
"status": "success",
"amount": amount
}
# Store result for future retries
cache.setex(cache_key, 86400, json.dumps(result)) # 24-hour cache
return result
# Client generates idempotency_key once and reuses it
user_id = 1
idempotency_key = str(uuid.uuid4())
result1 = process_payment(user_id, 99.99, idempotency_key)
result2 = process_payment(user_id, 99.99, idempotency_key) # Network fails, retry
# result1 == result2, no double charge!
Common mistakes: Not generating unique keys, storing results without expiry (disk fills up), or forgetting to pass idempotency key in retries.
9. Async Processing
What it is: Long-running tasks (email, reports, exports) run in background instead of blocking the user.
The analogy: Dropoff laundry service. You leave clothes, they text when ready. You don't wait 4 hours.
When to use: Operations take 5+ seconds. Tasks like sending emails (5-10s), generating PDF reports (30s), or video transcoding (minutes).
Python example:
from celery import Celery
import smtplib
app = Celery('myapp', broker='redis://localhost:6379')
@app.task
def send_welcome_email(user_id, email):
"""Background task - user doesn't wait"""
user = database.get_user(user_id)
# Takes 5 seconds (SMTP server slow)
smtplib.SMTP('smtp.gmail.com').send_message(
f"Welcome {user.name}!"
)
return f"Email sent to {email}"
# In your web request handler
def create_user_endpoint(request):
user = database.create_user(request.data)
# Queue task, return immediately (5ms)
send_welcome_email.delay(user.id, user.email)
return {"status": "Account created", "user_id": user.id}
# Email sends in background, user gets response instantly
Common mistakes: Not tracking async job status, tasks failing silently, or no retry mechanism for failed tasks.
10. Monitoring & Observability
What it is: Measure latency, error rates, and resource usage so you know when systems fail.
The analogy: Dashboard in your car shows speed, fuel, temperature. Without it, you don't know if engine is overheating until it breaks.
When to use: Always. Golden signals: latency (p95 <200ms), traffic (req/sec), errors (% failing), saturation (CPU/RAM/disk).
Python example:
import time
from prometheus_client import Counter, Histogram
# Define metrics
request_count = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
request_latency = Histogram('http_request_duration_seconds', 'Request latency', buckets=[0.1, 0.5, 1, 5])
errors = Counter('http_errors_total', 'Total errors', ['status'])
def api_handler(request):
start_time = time.time()
try:
result = process_request(request)
request_count.labels(method=request.method, endpoint=request.path).inc()
return result
except Exception as e:
errors.labels(status="500").inc()
raise
finally:
duration = time.time() - start_time
request_latency.observe(duration)
# Metrics exposed at /metrics
# Prometheus scrapes every 15 seconds
# Alert: if p95 latency > 500ms, check database sharding
Common mistakes: Not alerting on metrics, collecting wrong metrics, or alert thresholds too generic (no context).
11. Horizontal vs Vertical Scaling
What it is: Horizontal = add more servers. Vertical = make one server more powerful.
The analogy: Restaurant capacity. Vertical: add more tables to one building (limited). Horizontal: open another restaurant location.
When to use: Horizontal at 10k+ requests/sec (cheaper, no limit). Vertical works until 5-10k req/sec (simpler).
Python example:
# Vertical scaling issue: single server bottleneck
import multiprocessing
# 1 CPU core = 1000 req/sec max
# Vertical = upgrade to 16-core server = 16,000 req/sec
# But cost jumps from $50/month to $500/month
# Horizontal scaling:
# Use 10 small servers (1 core each) = 10,000 req/sec
# Cost: 10 × $5/month = $50/month (same or cheaper!)
# If traffic grows to 100k req/sec, add 100 servers
class HorizontalBalancer:
def __init__(self):
self.servers = ["web1", "web2", "web3"]
self.index = 0
def route_request(self):
server = self.servers[self.index % len(self.servers)]
self.index += 1
return server
# Horizontal wins: Netflix has 10,000+ servers worldwide
Common mistakes: Trying to vertical scale past CPU limit, not using load balancer, or database not sharded (becomes bottleneck).
12. Microservices Architecture
What it is: Split one large application into 5-10 small services, each handling one business function.
The analogy: Restaurant. Monolith = one chef does everything (cooking, plating, cashier). Microservices = separate chef (cooking), plater, cashier. Each works independently.
When to use: You have 50+ engineers and different services scale at different rates (video service needs more servers than auth service).
Python example:
from flask import Flask, jsonify
import requests
# Service 1: User service (port 5001)
user_app = Flask('user_service')
@user_app.route('/users/')
def get_user(user_id):
return jsonify({"id": user_id, "name": "John"})
# Service 2: Order service (port 5002)
order_app = Flask('order_service')
@order_app.route('/orders/')
def get_orders(user_id):
# Call user service
user = requests.get(f'http://user-service:5001/users/{user_id}').json()
orders = database.get_orders(user_id)
return jsonify({"user": user, "orders": orders})
# Benefits: scale user service and order service independently
# Downside: network calls between services are slower than function calls
Common mistakes: Too many microservices (100+, hard to manage), network calls very slow, database still monolithic (doesn't solve scalability).
13. Saga Pattern (Distributed Transactions)
What it is: Multi-step transactions across multiple services with compensating transactions on failure.
The analogy: Multi-step booking: book flight → book hotel → book rental car. If hotel is full, cancel flight AND release car seat.
When to use: Complex workflows needing ACID guarantees across services (e.g., payment + inventory + shipping).
Python example:
# Orchestrator-based saga for booking trip
class BookingOrchestrator:
def book_trip(self, user_id, flight_id, hotel_id):
try:
# Step 1: Book flight
flight_booking = self.flight_service.book(flight_id)
# Step 2: Book hotel
hotel_booking = self.hotel_service.book(hotel_id)
# Step 3: Charge payment
payment = self.payment_service.charge(user_id, 1500)
return {"status": "success", "bookings": [flight_booking, hotel_booking]}
except Exception as e:
# Compensating transactions on failure
self.flight_service.cancel(flight_booking['id'])
self.hotel_service.cancel(hotel_booking['id'])
self.payment_service.refund(payment['id'])
raise
# Saga ensures consistency but more complex than monolith transactions
Common mistakes: Forgetting compensating transactions, timeout too short (booking still processing), idempotency not implemented (double-cancel causes errors).
14. Eventual Consistency
What it is: Accept that replicas might lag. Data is eventually consistent instead of immediately consistent.
The analogy: Bank transfers between branches. You transfer $100 from Branch A to B. Branch B doesn't show $100 instantly; it shows after 5 minutes. But eventually, it's consistent.
When to use: At scale, immediate consistency requires expensive locking. Accept seconds/minutes delay for 1000x better performance.
Python example:
# Strong consistency (SLOW): write to primary, wait for replicas
def transfer_money_strong(account_from, account_to, amount):
# Acquire lock on both accounts (10ms)
account_from.balance -= amount
account_to.balance += amount
# Wait for all 3 replicas to confirm (50ms × 3)
replicate_to_all_replicas()
return "Transfer complete" # 160ms total
# Eventual consistency (FAST): write to primary, async replicate
def transfer_money_eventual(account_from, account_to, amount):
# Update primary (5ms)
account_from.balance -= amount
account_to.balance += amount
# Queue replication event (1ms)
queue_replication_event()
return "Transfer queued" # 6ms total
# Replicas catch up within 1-5 seconds
# 27x faster! Accept that user might see old balance briefly
Common mistakes: Showing stale data to users (confusing), not handling conflicts when replicas have different values, rebuilding replicas takes too long.
15. CAP Theorem
What it is: You can only guarantee 2 of 3: Consistency (all replicas same), Availability (always responsive), Partition tolerance (network splits don't break system).
The analogy: Restaurant opening 2 locations with shared inventory. Fast + Consistent = need network connection always. No network = you pick Fast (each location decides) or Consistent (both close until network returns).
When to use: Understanding tradeoffs in system design. CP systems (PostgreSQL): wait for consistency. AP systems (DynamoDB): accept stale data.
Python example:
from pymongo import WriteConcern
# CAP Theorem illustration
# PostgreSQL (CP): Consistent + Partition tolerant
# - If network split, one partition stops accepting writes (sacrifices availability)
# - All data stays consistent
# Use for: Banking (must be correct)
# DynamoDB (AP): Available + Partition tolerant
# - If network split, both partitions accept writes
# - Data might be inconsistent temporarily (you fix conflicts later)
# Use for: Social media (stale data OK)
# MongoDB with write concern "majority" (CP)
# Wait for write to reach majority replicas before returning
collection = get_mongo_collection()
result = collection.insert_one(
{"user_id": 1},
write_concern=WriteConcern(w="majority") # Wait for confirmation
)
# DynamoDB (AP)
table = get_dynamodb_table()
table.put_item(Item={"id": 1}) # Returns immediately, replicas catch up
Common mistakes: Thinking you can have all 3 (impossible), not understanding your system's choice, or picking wrong for your use case.
16. Data Partitioning Strategies
What it is: Organize data into ranges so queries touch fewer partitions.
The analogy: File cabinet with 1000 folders. Instead of searching all, put folders A-D in drawer 1, E-H in drawer 2, etc.
When to use: You have 50M+ rows and queries are slow. Partitioning lets each partition use an index.
Python example:
# Range partitioning by date
def query_logs_range(start_date, end_date):
# Query only partitions between dates
# Partition "2026_01" has Jan 2026 data
# Partition "2026_02" has Feb 2026 data
# Skip partition "2025_12" (old data)
partitions = get_partitions_in_range(start_date, end_date)
return query_partitions(partitions)
# Hash partitioning by user_id
def query_user_data(user_id):
hash_value = hash(user_id) % 16
partition = f"user_partition_{hash_value}"
return query_partition(partition)
# List partitioning (explicit)
def query_by_region(region):
partitions = {
"US": ["us_partition_1", "us_partition_2"],
"EU": ["eu_partition_1"],
"ASIA": ["asia_partition_1", "asia_partition_2"]
}
return query_partitions(partitions[region])
Common mistakes: Partition key causes data skew (50% data in one partition), not pruning dead partitions (disk fills), query spans all partitions (no benefit).
17. WebSockets for Real-Time
What it is: Keep persistent connection open. Server pushes data to client without client asking.
The analogy: Walkie-talkie vs sending letters. Letters (HTTP): you ask for mail, postman delivers. Walkie-talkie (WebSocket): person talks whenever they want, you hear instantly.
When to use: Real-time features: chat, live notifications, collaborative editing, stock price updates.
Python example:
from websockets import serve
import asyncio
import json
connected = set()
# WebSocket server
async def handler(websocket, path):
connected.add(websocket)
try:
# Connection stays open
async for message in websocket:
data = json.loads(message)
if data['type'] == 'chat':
# Broadcast to all connected clients
await broadcast(data['message'])
elif data['type'] == 'notification':
# Push to specific user
await send_to_user(data['user_id'], data['notification'])
finally:
connected.discard(websocket)
async def broadcast(message):
"""Send to all connected WebSocket clients"""
if not connected:
return
payload = json.dumps({"message": message})
for ws in connected:
await ws.send(payload)
async def main():
async with serve(handler, "localhost", 8765):
await asyncio.Future() # run forever
# Client connects once, receives updates forever
# vs HTTP: poll every 1 second (inefficient, high latency)
Common mistakes: WebSocket connection not closed on disconnect (connection leak), broadcasting to all clients when should be filtered, or no reconnection logic on disconnect.
18. Server-Sent Events (SSE)
What it is: HTTP connection where server streams data to client. Simpler than WebSocket.
The analogy: News ticker on TV screen. One-directional: ticker sends, you watch. Unlike walkie-talkie (two-way).
When to use: One-directional updates only (notifications, live scores, monitoring dashboard). If you need client to send data, use WebSocket instead.
Python example:
from flask import Flask, Response
import json
import time
app = Flask(__name__)
@app.route('/stream')
def stream():
def event_stream():
# Keep connection open, send data
while True:
data = {
"timestamp": time.time(),
"cpu": get_cpu_usage(),
"memory": get_memory_usage(),
"requests_per_sec": get_rps()
}
# SSE format
yield f"data: {json.dumps(data)}
"
time.sleep(1) # Update every second
return Response(event_stream(), mimetype="text/event-stream")
# Client connects and receives updates
# fetch('/stream').then(response => {
# response.body.getReader()...
# })
# Simpler than WebSocket but one-way only
Common mistakes: SSE connection not closed (resource leak), HTTP timeout after 1 hour (reconnect needed), or using for two-way communication (should be WebSocket).
19. Distributed Transactions (2PC)
What it is: Two-phase commit: prepare transactions on all servers, then confirm all together.
The analogy: Wedding ceremony. Phase 1: "Do you take...?" (prepare). Phase 2: "I do" (commit). If anyone says no in phase 1, marriage doesn't happen.
When to use: ACID guarantees across databases. Rare; most use Saga pattern instead (easier).
Python example:
# 2PC coordinator
class TwoPhaseCommit:
def execute(self, transaction):
participants = [db1, db2, db3]
# Phase 1: Prepare
votes = []
for participant in participants:
vote = participant.prepare(transaction) # Lock resources
votes.append(vote)
# All must vote YES
if all(votes):
# Phase 2: Commit
for participant in participants:
participant.commit(transaction) # Unlock, apply
return "Success"
else:
# Phase 2: Abort
for participant in participants:
participant.abort(transaction) # Rollback
return "Failed"
# Problem: If coordinator crashes between prepare & commit,
# resources locked forever (blocked transactions)
# 2PC is slow and complex. Use Saga instead.
Common mistakes: Coordinator failure leaves resources locked, network delays amplify latency, or using for microservices (huge overhead).
20. Throttling
What it is: Limit CPU/memory/disk usage by slowing down operations when system overloaded.
The analogy: Speed limit on highway. When traffic heavy, reduce speed to 40mph. When clear, go 70mph. Prevents gridlock.
When to use: CPU at 90%, memory at 90%, or disk queue deep. Slow down requests to prevent crash.
Python example:
import psutil
import time
def adaptive_throttle():
"""Slow down if system overloaded"""
cpu_percent = psutil.cpu_percent()
memory_percent = psutil.virtual_memory().percent
if cpu_percent > 80 or memory_percent > 85:
# System overloaded: sleep to reduce load
sleep_ms = (cpu_percent - 60) * 10 # Scale based on utilization
time.sleep(sleep_ms / 1000)
def process_request(request):
adaptive_throttle() # Check system before each request
return handle_request(request)
# Result: When CPU at 90%, add 300ms delay per request
# This reduces CPU to 70%, preventing crash
# Better than rejecting requests (throttling absorbs spike)
Common mistakes: Throttle threshold too high (system crashes before throttling), not differentiating between read vs write operations, or not monitoring effectiveness.
Comparison Table: Which Pattern When?
| Pattern |
Best For |
Complexity |
Implement When |
| Database Sharding |
Large datasets (50M+ rows) |
Hard |
Queries slow on single DB |
| Caching |
Repeated queries |
Easy |
Queries slow (>100ms) |
| Event-Driven |
Decoupling services |
Medium |
100+ operations per action |
| Rate Limiting |
API protection |
Easy |
Public APIs, abuse risk |
| Load Balancing |
Multiple servers |
Medium |
1000+ req/sec |
| Circuit Breaker |
External API calls |
Easy |
APIs fail occasionally |
| Connection Pooling |
Database efficiency |
Easy |
High query volume |
| Idempotency |
Retryable operations |
Medium |
Payment processing |
| Async Processing |
Long-running tasks |
Medium |
Tasks > 5 seconds |
| Monitoring |
System health |
Easy |
Always |
| Horizontal Scaling |
Growing traffic |
Medium |
10k+ req/sec |
| Microservices |
Large teams |
Hard |
50+ engineers |
| Saga Pattern |
Multi-step workflows |
Hard |
Distributed transactions needed |
| Eventual Consistency |
Performance at scale |
Medium |
1000x performance gain worth stale data |
| CAP Theorem |
Design decisions |
Medium |
Choosing database technology |
| Data Partitioning |
Large tables |
Medium |
50M+ rows |
| WebSockets |
Real-time features |
Medium |
Chat, live updates |
| SSE |
One-way streaming |
Easy |
Live notifications, dashboards |
| Distributed Transactions (2PC) |
ACID across DBs |
Very Hard |
Rare; use Saga instead |
| Throttling |
Overload protection |
Easy |
Prevent crashes during spikes |
Your Learning Path
Week 1-2: Master the Basics
- Caching Strategies (biggest impact, easiest to implement)
- Rate Limiting (protects your API)
- Async Processing (speeds up responses)
- Monitoring (know what's happening)
Week 3-4: Intermediate Patterns
- Load Balancing (handle more traffic)
- Connection Pooling (database efficiency)
- Circuit Breaker (reliability)
- Event-Driven Architecture (decouple services)
Month 2: Advanced Patterns
- Database Sharding (handle millions of records)
- Microservices Architecture (scale teams)
- Eventual Consistency (performance tradeoffs)
- WebSockets / SSE (real-time features)
Month 3+: Production Complexity
- Idempotency (payment safety)
- Saga Pattern (distributed workflow)
- Distributed Transactions (rare, but important to understand)
- CAP Theorem (database choice)
Python Libraries to Explore
- Caching: redis-py, memcache
- Async: Celery, asyncio, APScheduler
- Queues: Kafka (confluentinc-kafka), RabbitMQ (pika)
- Real-time: websockets, Flask-SQLAlchemy (connections)
- Monitoring: prometheus_client, statsd
- Database: SQLAlchemy (pooling), psycopg2 (PostgreSQL)
Next Steps
Pick one pattern from Week 1 (caching or rate limiting) and implement it in your project this week. You'll see immediate performance gains.
Need guidance on implementing these patterns in your specific stack? Our fractional CTO service can review your architecture and recommend the right patterns for your scale. Book a consultation.