Web DevelopmentFeatured

20 System Design Patterns for Millions of Concurrent Transactions

Master the essential system design patterns for handling millions of concurrent transactions. Learn database sharding, caching, microservices, and 17 more patterns with Python examples.

February 12, 2026
15 min read
By Primeworks Hub Team

Building systems that handle millions of concurrent transactions requires more than just good code. It requires understanding how to partition data, cache intelligently, and communicate asynchronously. This guide covers 20 essential system design patterns, each explained simply with Python examples.

The 20 Patterns You Need

  1. Database Sharding
  2. Caching Strategies
  3. Event-Driven Architecture
  4. Rate Limiting
  5. Load Balancing
  6. Circuit Breakers
  7. Connection Pooling
  8. Idempotency
  9. Async Processing
  10. Monitoring & Observability
  11. Horizontal vs Vertical Scaling
  12. Microservices Architecture
  13. Saga Pattern (Distributed Transactions)
  14. Eventual Consistency
  15. CAP Theorem
  16. Data Partitioning Strategies
  17. WebSockets for Real-Time
  18. Server-Sent Events (SSE)
  19. Distributed Transactions (2PC)
  20. Throttling

1. Database Sharding

What it is: Splitting your data across multiple database servers so no single server becomes a bottleneck.

The analogy: Imagine a library with 10 million books. Instead of one librarian searching through all books, hire 4 librarians. The first handles A-C, second D-F, third G-I, fourth J-Z. Each searches only their section.

When to use: Your database exceeds 10-50M records and queries are slowing down, or you need to distribute writes across servers.

Python example:

# Simple sharding by user_id
def get_shard_id(user_id, num_shards=4):
    """Determine which shard this user belongs to"""
    return user_id % num_shards

# In production, connect to different DB based on shard_id
def get_user_connection(user_id, shard_servers):
    shard_id = get_shard_id(user_id)
    return shard_servers[shard_id]  # e.g., 'db-shard-1.example.com'

# Example: Instagram uses range-based sharding
# Shard 1: Users 0-1M, Shard 2: Users 1M-2M, etc.
shards = {
    0: 'db1.instagram.com',  # 500M users
    1: 'db2.instagram.com',  # 500M users
    2: 'db3.instagram.com',
    3: 'db4.instagram.com'
}
user_id = 12345
connection = get_user_connection(user_id, shards)

Common mistakes: Choosing a poor shard key (like timestamp, which creates hot shards), not planning for shard rebalancing, or querying across all shards unnecessarily.


2. Caching Strategies

What it is: Storing frequently accessed data in fast memory (Redis, Memcached) to avoid repeated database queries.

The analogy: Instead of going to the library every time you want to read a bookshelf menu, print popular menus and tape them to your front door.

When to use: Queries take 100ms+ or same data is read 10+ times per minute.

Python example:

import redis
import json

cache = redis.Redis(host='localhost', port=6379)

def get_user_profile(user_id, ttl_seconds=3600):
    """Get user profile with Redis caching"""
    cache_key = f"user:{user_id}"
    
    # Try cache first (cache hit = 1ms)
    cached = cache.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Cache miss = query database (100ms)
    user = database.query(f"SELECT * FROM users WHERE id = {user_id}")
    
    # Store in cache with 1-hour TTL
    cache.setex(cache_key, ttl_seconds, json.dumps(user))
    return user

# Write-through caching: write to database AND cache
def update_user(user_id, data):
    database.update('users', data)
    cache.setex(f"user:{user_id}", 3600, json.dumps(data))

Common mistakes: Forgetting to invalidate cache after updates, setting TTL too high (stale data), or caching data that changes every minute.


3. Event-Driven Architecture

What it is: Instead of synchronous function calls, publish events to a message queue. Subscribers listen and react.

The analogy: Restaurant kitchen. Waiter doesn't hand ticket to cook and wait. Waiter puts ticket on a spike (message queue). Multiple cooks grab tickets as they're free. Some cooks prepare food, others plate it, others calculate bill.

When to use: You have 100+ operations triggered by a single action, or you need to decouple services.

Python example:

import json
from kafka import KafkaProducer, KafkaConsumer

# Event producer: user signs up
producer = KafkaProducer(bootstrap_servers=['localhost:9092'])

def create_user(email, name):
    user = database.insert('users', {'email': email, 'name': name})
    
    # Publish event instead of calling functions directly
    event = {'type': 'user.created', 'user_id': user['id'], 'email': email}
    producer.send('events', json.dumps(event).encode())

# Event consumers: independent services
consumer = KafkaConsumer('events', bootstrap_servers=['localhost:9092'])

for message in consumer:
    event = json.loads(message.value)
    
    if event['type'] == 'user.created':
        send_welcome_email(event['email'])  # Email service
        create_free_trial(event['user_id'])  # Billing service
        log_analytics(event)  # Analytics service
        # All run independently, in parallel!

Common mistakes: Event ordering matters but not guaranteed in Kafka, events can be processed twice (fix with idempotency), messages can be lost if not configured right.


4. Rate Limiting

What it is: Restrict how many requests a client can make per time period.

The analogy: Vending machine that only dispenses one drink per 10 seconds, preventing people from draining it.

When to use: Protecting APIs from abuse, preventing DDoS, or ensuring fair resource usage. Typical: 100 req/min for users, 1000 req/min for premium.

Python example:

import redis
import time
from functools import wraps

redis_client = redis.Redis()

def rate_limit(requests_per_minute=60):
    """Token bucket algorithm for rate limiting"""
    def decorator(func):
        @wraps(func)
        def wrapper(user_id):
            key = f"rate_limit:{user_id}"
            current = redis_client.incr(key)
            
            # Set 1-minute expiry on first request
            if current == 1:
                redis_client.expire(key, 60)
            
            # Reject if over limit
            if current > requests_per_minute:
                return {"error": "Rate limit exceeded", "status": 429}
            
            return func(user_id)
        return wrapper
    return decorator

@rate_limit(requests_per_minute=100)
def api_get_user(user_id):
    return database.get_user(user_id)

Common mistakes: Resetting limit incorrectly, not returning HTTP 429, or not distinguishing between user tiers.


5. Load Balancing

What it is: Distribute incoming requests across multiple servers so no server gets overloaded.

The analogy: Pizza delivery dispatcher assigns orders to drivers based on current workload. Driver with 2 pending deliveries gets the next call, not the driver with 6.

When to use: You have multiple web servers and 1000+ requests/second.

Python example:

# Round-robin load balancer (basic)
class RoundRobinBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.current_index = 0
    
    def get_next_server(self):
        """Cycle through servers sequentially"""
        server = self.servers[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.servers)
        return server

# Least connections balancer
class LeastConnectionsBalancer:
    def __init__(self, servers):
        self.server_load = {server: 0 for server in servers}
    
    def get_next_server(self):
        """Pick server with fewest active connections"""
        return min(self.server_load, key=self.server_load.get)
    
    def request_sent(self, server):
        self.server_load[server] += 1
    
    def request_completed(self, server):
        self.server_load[server] -= 1

balancer = LeastConnectionsBalancer(['web1.example.com', 'web2.example.com'])
server1 = balancer.get_next_server()  # Pick best server
balancer.request_sent(server1)

Common mistakes: Sticky sessions not implemented (user bounces between servers), health checks not running, or not handling server failures.


6. Circuit Breaker Pattern

What it is: When a service fails repeatedly, stop calling it temporarily. It's like a circuit breaker in your home—it cuts electricity to prevent fire.

The analogy: Calling your pizza place. If it goes to voicemail 3 times, don't call for the next 30 seconds. Try again later.

When to use: Calling external APIs or databases that might fail. Prevents cascading failures.

Python example:

import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"        # Working normally
    OPEN = "open"           # Failing, reject requests
    HALF_OPEN = "half_open" # Testing if recovered

class CircuitBreaker:
    def __init__(self, failure_threshold=3, timeout=30):
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.last_failure_time = None
    
    def call(self, func, *args):
        if self.state == CircuitState.OPEN:
            if time.time() - self.last_failure_time > self.timeout:
                self.state = CircuitState.HALF_OPEN
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args)
            self.on_success()
            return result
        except Exception as e:
            self.on_failure()
            raise
    
    def on_success(self):
        self.failure_count = 0
        self.state = CircuitState.CLOSED
    
    def on_failure(self):
        self.failure_count += 1
        self.last_failure_time = time.time()
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN

# Usage
breaker = CircuitBreaker(failure_threshold=3, timeout=30)
try:
    result = breaker.call(external_api_call, user_id=123)
except Exception as e:
    print("Service temporarily unavailable")

Common mistakes: Not monitoring HALF_OPEN state, timeout too short (keeps retrying too fast), or forgetting to reset failure count.


7. Connection Pooling

What it is: Reuse database connections instead of creating new ones for every query.

The analogy: Taxi company keeps 10 taxis idle. When you call for a taxi, they give you an existing one (fast). Better than finding a taxi driver, hiring them, and training them every time.

When to use: Database connections are slow (50-100ms to establish), and you make 100+ queries/second.

Python example:

from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool

# Create connection pool: min 5 connections, max 20
engine = create_engine(
    'postgresql://user:password@localhost/mydb',
    poolclass=QueuePool,
    pool_size=5,           # Keep 5 idle connections
    max_overflow=15,       # Allow 15 more if needed
    pool_pre_ping=True     # Test connection health before use
)

def query_user(user_id):
    # Get connection from pool (fast if available)
    with engine.connect() as conn:
        result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}")
        return result.fetchone()
    # Connection returns to pool for reuse

# Without pooling: each query = 50ms overhead
# With pooling: reuse = 1ms overhead = 50x faster!

Common mistakes: Pool size too small (connections exhaust), not handling exceeding max connections, or not configuring timeout.


8. Idempotency

What it is: An operation produces the same result whether called once or 100 times.

The analogy: Pressing an elevator button. Pressing once = elevator comes. Pressing 10 times = same result, elevator doesn't come 10 times.

When to use: Payment processing, order placement, or any critical operation that might be retried.

Python example:

import uuid
import redis
import json

cache = redis.Redis()

def process_payment(user_id, amount, idempotency_key):
    """Process payment with idempotency protection"""
    cache_key = f"payment:{idempotency_key}"
    
    # Check if we've already processed this
    existing_result = cache.get(cache_key)
    if existing_result:
        return json.loads(existing_result)  # Return cached result
    
    # Process payment (charge credit card)
    result = {
        "transaction_id": str(uuid.uuid4()),
        "status": "success",
        "amount": amount
    }
    
    # Store result for future retries
    cache.setex(cache_key, 86400, json.dumps(result))  # 24-hour cache
    return result

# Client generates idempotency_key once and reuses it
user_id = 1
idempotency_key = str(uuid.uuid4())
result1 = process_payment(user_id, 99.99, idempotency_key)
result2 = process_payment(user_id, 99.99, idempotency_key)  # Network fails, retry
# result1 == result2, no double charge!

Common mistakes: Not generating unique keys, storing results without expiry (disk fills up), or forgetting to pass idempotency key in retries.


9. Async Processing

What it is: Long-running tasks (email, reports, exports) run in background instead of blocking the user.

The analogy: Dropoff laundry service. You leave clothes, they text when ready. You don't wait 4 hours.

When to use: Operations take 5+ seconds. Tasks like sending emails (5-10s), generating PDF reports (30s), or video transcoding (minutes).

Python example:

from celery import Celery
import smtplib

app = Celery('myapp', broker='redis://localhost:6379')

@app.task
def send_welcome_email(user_id, email):
    """Background task - user doesn't wait"""
    user = database.get_user(user_id)
    
    # Takes 5 seconds (SMTP server slow)
    smtplib.SMTP('smtp.gmail.com').send_message(
        f"Welcome {user.name}!"
    )
    return f"Email sent to {email}"

# In your web request handler
def create_user_endpoint(request):
    user = database.create_user(request.data)
    
    # Queue task, return immediately (5ms)
    send_welcome_email.delay(user.id, user.email)
    
    return {"status": "Account created", "user_id": user.id}
    # Email sends in background, user gets response instantly

Common mistakes: Not tracking async job status, tasks failing silently, or no retry mechanism for failed tasks.


10. Monitoring & Observability

What it is: Measure latency, error rates, and resource usage so you know when systems fail.

The analogy: Dashboard in your car shows speed, fuel, temperature. Without it, you don't know if engine is overheating until it breaks.

When to use: Always. Golden signals: latency (p95 <200ms), traffic (req/sec), errors (% failing), saturation (CPU/RAM/disk).

Python example:

import time
from prometheus_client import Counter, Histogram

# Define metrics
request_count = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint'])
request_latency = Histogram('http_request_duration_seconds', 'Request latency', buckets=[0.1, 0.5, 1, 5])
errors = Counter('http_errors_total', 'Total errors', ['status'])

def api_handler(request):
    start_time = time.time()
    
    try:
        result = process_request(request)
        request_count.labels(method=request.method, endpoint=request.path).inc()
        return result
    except Exception as e:
        errors.labels(status="500").inc()
        raise
    finally:
        duration = time.time() - start_time
        request_latency.observe(duration)

# Metrics exposed at /metrics
# Prometheus scrapes every 15 seconds
# Alert: if p95 latency > 500ms, check database sharding

Common mistakes: Not alerting on metrics, collecting wrong metrics, or alert thresholds too generic (no context).


11. Horizontal vs Vertical Scaling

What it is: Horizontal = add more servers. Vertical = make one server more powerful.

The analogy: Restaurant capacity. Vertical: add more tables to one building (limited). Horizontal: open another restaurant location.

When to use: Horizontal at 10k+ requests/sec (cheaper, no limit). Vertical works until 5-10k req/sec (simpler).

Python example:

# Vertical scaling issue: single server bottleneck
import multiprocessing

# 1 CPU core = 1000 req/sec max
# Vertical = upgrade to 16-core server = 16,000 req/sec
# But cost jumps from $50/month to $500/month

# Horizontal scaling:
# Use 10 small servers (1 core each) = 10,000 req/sec
# Cost: 10 × $5/month = $50/month (same or cheaper!)
# If traffic grows to 100k req/sec, add 100 servers

class HorizontalBalancer:
    def __init__(self):
        self.servers = ["web1", "web2", "web3"]
        self.index = 0
    
    def route_request(self):
        server = self.servers[self.index % len(self.servers)]
        self.index += 1
        return server

# Horizontal wins: Netflix has 10,000+ servers worldwide

Common mistakes: Trying to vertical scale past CPU limit, not using load balancer, or database not sharded (becomes bottleneck).


12. Microservices Architecture

What it is: Split one large application into 5-10 small services, each handling one business function.

The analogy: Restaurant. Monolith = one chef does everything (cooking, plating, cashier). Microservices = separate chef (cooking), plater, cashier. Each works independently.

When to use: You have 50+ engineers and different services scale at different rates (video service needs more servers than auth service).

Python example:

from flask import Flask, jsonify
import requests

# Service 1: User service (port 5001)
user_app = Flask('user_service')

@user_app.route('/users/')
def get_user(user_id):
    return jsonify({"id": user_id, "name": "John"})

# Service 2: Order service (port 5002)
order_app = Flask('order_service')

@order_app.route('/orders/')
def get_orders(user_id):
    # Call user service
    user = requests.get(f'http://user-service:5001/users/{user_id}').json()
    orders = database.get_orders(user_id)
    return jsonify({"user": user, "orders": orders})

# Benefits: scale user service and order service independently
# Downside: network calls between services are slower than function calls

Common mistakes: Too many microservices (100+, hard to manage), network calls very slow, database still monolithic (doesn't solve scalability).


13. Saga Pattern (Distributed Transactions)

What it is: Multi-step transactions across multiple services with compensating transactions on failure.

The analogy: Multi-step booking: book flight → book hotel → book rental car. If hotel is full, cancel flight AND release car seat.

When to use: Complex workflows needing ACID guarantees across services (e.g., payment + inventory + shipping).

Python example:

# Orchestrator-based saga for booking trip
class BookingOrchestrator:
    def book_trip(self, user_id, flight_id, hotel_id):
        try:
            # Step 1: Book flight
            flight_booking = self.flight_service.book(flight_id)
            
            # Step 2: Book hotel
            hotel_booking = self.hotel_service.book(hotel_id)
            
            # Step 3: Charge payment
            payment = self.payment_service.charge(user_id, 1500)
            
            return {"status": "success", "bookings": [flight_booking, hotel_booking]}
        except Exception as e:
            # Compensating transactions on failure
            self.flight_service.cancel(flight_booking['id'])
            self.hotel_service.cancel(hotel_booking['id'])
            self.payment_service.refund(payment['id'])
            raise

# Saga ensures consistency but more complex than monolith transactions

Common mistakes: Forgetting compensating transactions, timeout too short (booking still processing), idempotency not implemented (double-cancel causes errors).


14. Eventual Consistency

What it is: Accept that replicas might lag. Data is eventually consistent instead of immediately consistent.

The analogy: Bank transfers between branches. You transfer $100 from Branch A to B. Branch B doesn't show $100 instantly; it shows after 5 minutes. But eventually, it's consistent.

When to use: At scale, immediate consistency requires expensive locking. Accept seconds/minutes delay for 1000x better performance.

Python example:

# Strong consistency (SLOW): write to primary, wait for replicas
def transfer_money_strong(account_from, account_to, amount):
    # Acquire lock on both accounts (10ms)
    account_from.balance -= amount
    account_to.balance += amount
    
    # Wait for all 3 replicas to confirm (50ms × 3)
    replicate_to_all_replicas()
    return "Transfer complete"  # 160ms total

# Eventual consistency (FAST): write to primary, async replicate
def transfer_money_eventual(account_from, account_to, amount):
    # Update primary (5ms)
    account_from.balance -= amount
    account_to.balance += amount
    
    # Queue replication event (1ms)
    queue_replication_event()
    return "Transfer queued"  # 6ms total
    # Replicas catch up within 1-5 seconds

# 27x faster! Accept that user might see old balance briefly

Common mistakes: Showing stale data to users (confusing), not handling conflicts when replicas have different values, rebuilding replicas takes too long.


15. CAP Theorem

What it is: You can only guarantee 2 of 3: Consistency (all replicas same), Availability (always responsive), Partition tolerance (network splits don't break system).

The analogy: Restaurant opening 2 locations with shared inventory. Fast + Consistent = need network connection always. No network = you pick Fast (each location decides) or Consistent (both close until network returns).

When to use: Understanding tradeoffs in system design. CP systems (PostgreSQL): wait for consistency. AP systems (DynamoDB): accept stale data.

Python example:

from pymongo import WriteConcern

# CAP Theorem illustration
# PostgreSQL (CP): Consistent + Partition tolerant
# - If network split, one partition stops accepting writes (sacrifices availability)
# - All data stays consistent
# Use for: Banking (must be correct)

# DynamoDB (AP): Available + Partition tolerant
# - If network split, both partitions accept writes
# - Data might be inconsistent temporarily (you fix conflicts later)
# Use for: Social media (stale data OK)

# MongoDB with write concern "majority" (CP)
# Wait for write to reach majority replicas before returning
collection = get_mongo_collection()
result = collection.insert_one(
    {"user_id": 1},
    write_concern=WriteConcern(w="majority")  # Wait for confirmation
)

# DynamoDB (AP)
table = get_dynamodb_table()
table.put_item(Item={"id": 1})  # Returns immediately, replicas catch up

Common mistakes: Thinking you can have all 3 (impossible), not understanding your system's choice, or picking wrong for your use case.


16. Data Partitioning Strategies

What it is: Organize data into ranges so queries touch fewer partitions.

The analogy: File cabinet with 1000 folders. Instead of searching all, put folders A-D in drawer 1, E-H in drawer 2, etc.

When to use: You have 50M+ rows and queries are slow. Partitioning lets each partition use an index.

Python example:

# Range partitioning by date
def query_logs_range(start_date, end_date):
    # Query only partitions between dates
    # Partition "2026_01" has Jan 2026 data
    # Partition "2026_02" has Feb 2026 data
    # Skip partition "2025_12" (old data)
    partitions = get_partitions_in_range(start_date, end_date)
    return query_partitions(partitions)

# Hash partitioning by user_id
def query_user_data(user_id):
    hash_value = hash(user_id) % 16
    partition = f"user_partition_{hash_value}"
    return query_partition(partition)

# List partitioning (explicit)
def query_by_region(region):
    partitions = {
        "US": ["us_partition_1", "us_partition_2"],
        "EU": ["eu_partition_1"],
        "ASIA": ["asia_partition_1", "asia_partition_2"]
    }
    return query_partitions(partitions[region])

Common mistakes: Partition key causes data skew (50% data in one partition), not pruning dead partitions (disk fills), query spans all partitions (no benefit).


17. WebSockets for Real-Time

What it is: Keep persistent connection open. Server pushes data to client without client asking.

The analogy: Walkie-talkie vs sending letters. Letters (HTTP): you ask for mail, postman delivers. Walkie-talkie (WebSocket): person talks whenever they want, you hear instantly.

When to use: Real-time features: chat, live notifications, collaborative editing, stock price updates.

Python example:

from websockets import serve
import asyncio
import json

connected = set()

# WebSocket server
async def handler(websocket, path):
    connected.add(websocket)
    try:
        # Connection stays open
        async for message in websocket:
            data = json.loads(message)
            
            if data['type'] == 'chat':
                # Broadcast to all connected clients
                await broadcast(data['message'])
            
            elif data['type'] == 'notification':
                # Push to specific user
                await send_to_user(data['user_id'], data['notification'])
    finally:
        connected.discard(websocket)

async def broadcast(message):
    """Send to all connected WebSocket clients"""
    if not connected:
        return
    payload = json.dumps({"message": message})
    for ws in connected:
        await ws.send(payload)

async def main():
    async with serve(handler, "localhost", 8765):
        await asyncio.Future()  # run forever

# Client connects once, receives updates forever
# vs HTTP: poll every 1 second (inefficient, high latency)

Common mistakes: WebSocket connection not closed on disconnect (connection leak), broadcasting to all clients when should be filtered, or no reconnection logic on disconnect.


18. Server-Sent Events (SSE)

What it is: HTTP connection where server streams data to client. Simpler than WebSocket.

The analogy: News ticker on TV screen. One-directional: ticker sends, you watch. Unlike walkie-talkie (two-way).

When to use: One-directional updates only (notifications, live scores, monitoring dashboard). If you need client to send data, use WebSocket instead.

Python example:

from flask import Flask, Response
import json
import time

app = Flask(__name__)

@app.route('/stream')
def stream():
    def event_stream():
        # Keep connection open, send data
        while True:
            data = {
                "timestamp": time.time(),
                "cpu": get_cpu_usage(),
                "memory": get_memory_usage(),
                "requests_per_sec": get_rps()
            }
            # SSE format
            yield f"data: {json.dumps(data)}

"
            time.sleep(1)  # Update every second
    
    return Response(event_stream(), mimetype="text/event-stream")

# Client connects and receives updates
# fetch('/stream').then(response => {
#   response.body.getReader()...
# })

# Simpler than WebSocket but one-way only

Common mistakes: SSE connection not closed (resource leak), HTTP timeout after 1 hour (reconnect needed), or using for two-way communication (should be WebSocket).


19. Distributed Transactions (2PC)

What it is: Two-phase commit: prepare transactions on all servers, then confirm all together.

The analogy: Wedding ceremony. Phase 1: "Do you take...?" (prepare). Phase 2: "I do" (commit). If anyone says no in phase 1, marriage doesn't happen.

When to use: ACID guarantees across databases. Rare; most use Saga pattern instead (easier).

Python example:

# 2PC coordinator
class TwoPhaseCommit:
    def execute(self, transaction):
        participants = [db1, db2, db3]
        
        # Phase 1: Prepare
        votes = []
        for participant in participants:
            vote = participant.prepare(transaction)  # Lock resources
            votes.append(vote)
        
        # All must vote YES
        if all(votes):
            # Phase 2: Commit
            for participant in participants:
                participant.commit(transaction)  # Unlock, apply
            return "Success"
        else:
            # Phase 2: Abort
            for participant in participants:
                participant.abort(transaction)  # Rollback
            return "Failed"

# Problem: If coordinator crashes between prepare & commit,
# resources locked forever (blocked transactions)
# 2PC is slow and complex. Use Saga instead.

Common mistakes: Coordinator failure leaves resources locked, network delays amplify latency, or using for microservices (huge overhead).


20. Throttling

What it is: Limit CPU/memory/disk usage by slowing down operations when system overloaded.

The analogy: Speed limit on highway. When traffic heavy, reduce speed to 40mph. When clear, go 70mph. Prevents gridlock.

When to use: CPU at 90%, memory at 90%, or disk queue deep. Slow down requests to prevent crash.

Python example:

import psutil
import time

def adaptive_throttle():
    """Slow down if system overloaded"""
    cpu_percent = psutil.cpu_percent()
    memory_percent = psutil.virtual_memory().percent
    
    if cpu_percent > 80 or memory_percent > 85:
        # System overloaded: sleep to reduce load
        sleep_ms = (cpu_percent - 60) * 10  # Scale based on utilization
        time.sleep(sleep_ms / 1000)

def process_request(request):
    adaptive_throttle()  # Check system before each request
    return handle_request(request)

# Result: When CPU at 90%, add 300ms delay per request
# This reduces CPU to 70%, preventing crash
# Better than rejecting requests (throttling absorbs spike)

Common mistakes: Throttle threshold too high (system crashes before throttling), not differentiating between read vs write operations, or not monitoring effectiveness.


Comparison Table: Which Pattern When?

Pattern Best For Complexity Implement When
Database Sharding Large datasets (50M+ rows) Hard Queries slow on single DB
Caching Repeated queries Easy Queries slow (>100ms)
Event-Driven Decoupling services Medium 100+ operations per action
Rate Limiting API protection Easy Public APIs, abuse risk
Load Balancing Multiple servers Medium 1000+ req/sec
Circuit Breaker External API calls Easy APIs fail occasionally
Connection Pooling Database efficiency Easy High query volume
Idempotency Retryable operations Medium Payment processing
Async Processing Long-running tasks Medium Tasks > 5 seconds
Monitoring System health Easy Always
Horizontal Scaling Growing traffic Medium 10k+ req/sec
Microservices Large teams Hard 50+ engineers
Saga Pattern Multi-step workflows Hard Distributed transactions needed
Eventual Consistency Performance at scale Medium 1000x performance gain worth stale data
CAP Theorem Design decisions Medium Choosing database technology
Data Partitioning Large tables Medium 50M+ rows
WebSockets Real-time features Medium Chat, live updates
SSE One-way streaming Easy Live notifications, dashboards
Distributed Transactions (2PC) ACID across DBs Very Hard Rare; use Saga instead
Throttling Overload protection Easy Prevent crashes during spikes

Your Learning Path

Week 1-2: Master the Basics

  1. Caching Strategies (biggest impact, easiest to implement)
  2. Rate Limiting (protects your API)
  3. Async Processing (speeds up responses)
  4. Monitoring (know what's happening)

Week 3-4: Intermediate Patterns

  1. Load Balancing (handle more traffic)
  2. Connection Pooling (database efficiency)
  3. Circuit Breaker (reliability)
  4. Event-Driven Architecture (decouple services)

Month 2: Advanced Patterns

  1. Database Sharding (handle millions of records)
  2. Microservices Architecture (scale teams)
  3. Eventual Consistency (performance tradeoffs)
  4. WebSockets / SSE (real-time features)

Month 3+: Production Complexity

  1. Idempotency (payment safety)
  2. Saga Pattern (distributed workflow)
  3. Distributed Transactions (rare, but important to understand)
  4. CAP Theorem (database choice)

Python Libraries to Explore

  • Caching: redis-py, memcache
  • Async: Celery, asyncio, APScheduler
  • Queues: Kafka (confluentinc-kafka), RabbitMQ (pika)
  • Real-time: websockets, Flask-SQLAlchemy (connections)
  • Monitoring: prometheus_client, statsd
  • Database: SQLAlchemy (pooling), psycopg2 (PostgreSQL)

Next Steps

Pick one pattern from Week 1 (caching or rate limiting) and implement it in your project this week. You'll see immediate performance gains.

Need guidance on implementing these patterns in your specific stack? Our fractional CTO service can review your architecture and recommend the right patterns for your scale. Book a consultation.

Related Topics

System DesignScalabilityPythonArchitecturePerformanceHigh TrafficBackend

Ready to Transform Your Business with AI?

Our team at Primeworks Hub specializes in implementing AI solutions for SMEs. Get a free consultation to discuss your specific needs.

Stay Updated

Get the latest insights on AI, web development, and business growth delivered to your inbox

Subscribe to Newsletter