Scalability
Chapter 3: Scalability Fundamentals
Section titled “Chapter 3: Scalability Fundamentals”Building Systems That Grow
Section titled “Building Systems That Grow”3.1 Understanding Scalability
Section titled “3.1 Understanding Scalability”Scalability is the capability of a system to handle a growing amount of work by adding resources. It’s not just about handling more users—it’s about handling growth efficiently.
Scalability vs Performance ==========================
Performance: How FAST the system processes requests Scalability: How MUCH the system can handle
+------------------+ +------------------+ | Performance | | Scalability | | | | | | - Low latency | | - Handle more | | - High throughput| | users | | - Fast response | | - Grow with | | | | demand | +------------------+ +------------------+3.2 Types of Scaling
Section titled “3.2 Types of Scaling”3.2.1 Vertical Scaling (Scale Up)
Section titled “3.2.1 Vertical Scaling (Scale Up)”Adding more power to your existing machine:
Vertical Scaling ================
Before After +-------------+ +-------------+ | Server | | Server | | 2 CPU | ===> | 32 CPU | | 4GB RAM | | 128GB RAM | | 100GB SSD | | 2TB SSD | +-------------+ +-------------+
Pros: + Simple to implement + No code changes needed + Single point of control
Cons: + Hardware limits + Downtime for upgrades + Single point of failure + Expensive at high end3.2.2 Horizontal Scaling (Scale Out)
Section titled “3.2.2 Horizontal Scaling (Scale Out)”Adding more machines to handle load:
Horizontal Scaling ==================
Load Balancer | +-------------+-------------+-------------+ | | | | v v v v +----------+ +----------+ +----------+ +----------+ | Server 1 | | Server 2 | | Server 3 | | Server N | | (2 CPU) | | (2 CPU) | | (2 CPU) | | (2 CPU) | +----------+ +----------+ +----------+ +----------+
Pros: + Linear scalability + No hardware limits + Fault tolerance + Cost-effective
Cons: + Complex architecture + Data consistency challenges + Network overhead + More complex debugging3.3 Scaling Dimensions
Section titled “3.3 Scaling Dimensions”The Three Dimensions
Section titled “The Three Dimensions” Scaling Cube ============
X-Axis: Cloning (Multiple identical copies) | | Z-Axis: Sharding | (Data partitioning) | | | | +-------------+-------------+-------------+----> | | Y-Axis: Functional decomposition (Split by function)
X-Axis: Cloning =========== +----+ +----+ +----+ +----+ | S1 | | S2 | | S3 | | S4 | All handle same function +----+ +----+ +----+ +----+
Y-Axis: Functional Decomposition =============================== +-------+ +-------+ +-------+ | User | | Order | |Payment| Different functions | Service| | Service| |Service| +-------+ +-------+ +-------+
Z-Axis: Sharding =============== +-------+ +-------+ | Users | | Users | | A-M | | N-Z | Same function, different data +-------+ +-------+3.3.1 Database Scaling
Section titled “3.3.1 Database Scaling” Database Scaling Strategies ==========================
1. Vertical Scaling +-------------+ | Primary | | Database | | (Powerful | | server) | +-------------+
2. Read Replicas +-------------+ +-------------+ +-------------+ | Primary | ---->| Replica | ---> | Replica | | Database | | (Read) | | (Read) | +-------------+ +-------------+ +-------------+
3. Sharding (Horizontal Partitioning) +-------------+ +-------------+ | Shard 1 | | Shard 2 | | Users A-M | | Users N-Z | +-------------+ +-------------+3.4 Load Types and Scaling
Section titled “3.4 Load Types and Scaling”4.1 Compute-Bound Loads
Section titled “4.1 Compute-Bound Loads” Compute-Bound Scaling =====================
Characteristic: CPU is the bottleneck
Solution: Add more compute instances +------------+ +------------+ +------------+ | Instance | | Instance | | Instance | | (CPU 100%)| | (CPU 100%)| | (CPU 100%)| +------------+ +------------+ +------------+
Use Cases: - Image/video processing - Machine learning inference - Complex calculations - Video encoding3.4.2 Memory-Bound Loads
Section titled “3.4.2 Memory-Bound Loads” Memory-Bound Scaling ===================
Characteristic: RAM is the bottleneck
Solution: Add more instances with more RAM or use caching
+------------+ +-----------+ | Instance | | Redis | | | | Cache | | Memory | | | | (95% used) | | Hot data | +------------+ +-----------+
Use Cases: - In-memory databases - Real-time analytics - Session storage - Caching layers3.4.3 I/O-Bound Loads
Section titled “3.4.3 I/O-Bound Loads” I/O-Bound Scaling =================
Characteristic: Disk or Network is bottleneck
Solution: Use caching, CDNs, async processing
+--------+ +--------+ +--------+ | Request| | Request| | Request| +--------+ +--------+ +--------+ | | | v v v +----------------------------------------+ | Caching Layer | | (Serve from memory, not disk) | +----------------------------------------+
Use Cases: - File storage systems - Database operations - Video streaming - Large data transfers3.5 Scalability Patterns
Section titled “3.5 Scalability Patterns”3.5.1 Stateless Services
Section titled “3.5.1 Stateless Services” Stateless Architecture ======================
Request 1 Request 2 Request 3 | | | v v v +----------------------------------------------+ | Load Balancer | +----------------------------------------------+ | | | v v v +----------+ +----------+ +----------+ | Server A | | Server B | | Server C | | State: | | State: | | State: | | None | | None | | None | +----------+ +----------+ +----------+ | | | v v v +----------------------------------------------+ | External State Store | | (Redis, Database, S3, etc.) | +----------------------------------------------+
Why Stateless? - Any server can handle any request - Easy to add/remove servers - Simple load balancing - Better fault tolerance3.5.2 State Management
Section titled “3.5.2 State Management” Session Management Options ==========================
1. Sticky Sessions (Not recommended) +----------+ +----------+ |User A | --> | Server 1 | |Session | | Session | +----------+ +----------+
Problem: Server failure = session loss
2. Session Store (Recommended) +----------+ +----------+ +----------+ | User A | --> | Any | --> | Redis | | Session | | Server | | Session | +----------+ +----------+ +----------+
Benefit: Any server can handle the request3.5.3 Data Partitioning
Section titled “3.5.3 Data Partitioning” Partitioning Strategies ======================
1. Horizontal Partitioning (Sharding) +------------------------+ | All Users | +------------------------+ | v +----------+ +----------+ | A-M | | N-Z | | Shard | | Shard | +----------+ +----------+
2. Vertical Partitioning +------------------------+ | All Data | +------------------------+ | v +----------+----------+----------+ | Profile | Orders | Analytics| | (Table1) | (Table2)| (Table3) | +----------+----------+----------+
3. Functional Partitioning +------------------------+ | All Services | +------------------------+ | v +-------+-------+-------+ |User |Order |Payment| |Service|Service|Service| +-------+-------+-------+3.6 Autoscaling
Section titled “3.6 Autoscaling”How Autoscaling Works
Section titled “How Autoscaling Works” Autoscaling Architecture =========================
+----------------------------------------------------------+ | Cloud Provider | +----------------------------------------------------------+ | | | +----------------+ +---------------------------+ | | | Auto Scaling | | Load Balancer | | | | Group | | | | | +----------------+ +---------------------------+ | | | | | | v v | | +-------------------------------------------------+ | | | Scaling Policies | | | | - Scale out when CPU > 70% | | | | - Scale in when CPU < 30% | | | | - Min instances: 2 | | | | - Max instances: 20 | | | +-------------------------------------------------+ | | | | +--------+ +--------+ +--------+ +--------+ | | |Instance| |Instance| |Instance| |Instance| | | | 1 | | 2 | | 3 | | 4 | | | +--------+ +--------+ +--------+ +--------+ | +----------------------------------------------------------+Metrics for Autoscaling
Section titled “Metrics for Autoscaling”| Metric | Description | Use Case |
|---|---|---|
| CPU Utilization | Percentage of CPU used | General compute |
| Memory Utilization | RAM usage | Memory-intensive |
| Request Count | Requests per instance | Web applications |
| Network I/O | Bytes in/out | Network-bound |
| Custom Metrics | Business-specific | Unique workloads |
3.7 Capacity Planning
Section titled “3.7 Capacity Planning”Estimating Capacity
Section titled “Estimating Capacity” Capacity Planning Process =========================
Step 1: Determine Target Load +---------------------------+ | - Expected users | | - Requests per user | | - Peak traffic times | +---------------------------+
Step 2: Calculate Requirements +---------------------------+ | - Requests per second | | - Bandwidth needed | | - Storage requirements | +---------------------------+
Step 3: Apply Safety Factor +---------------------------+ | - 2-3x for growth | | - Buffer for spikes | | - Consider peak seasons | +---------------------------+
Example Calculation ===================
Given: - 1,000,000 monthly active users - Each user makes 10 requests per day - Average request takes 100ms
Calculate: - Daily requests: 10,000,000 - Requests per second: 10,000,000 / 86,400 ≈ 116 RPS - Peak (3x): ~350 RPS - With safety factor (3x): ~1000 RPS capacity needed3.8 Real-World Examples
Section titled “3.8 Real-World Examples”Scaling Example: Amazon
Section titled “Scaling Example: Amazon” Amazon's Scaling Journey ========================
1995: Single Server +----------+ | Website | Everything on one server +----------+
1998: Three-Tier +----------+ +----------+ +----------+ | Web |--> | App |--> | DB | | Server | | Server | | Server | +----------+ +----------+ +----------+
2000s: Distributed +--------------------------------------------------+ | CDN -> Load Balancer -> Services -> Databases | +--------------------------------------------------+
Today: Global Scale +--------------------------------------------------+ | Multiple Regions -> Edge Locations -> Services | | -> Microservices -> Polyglot Persistence | +--------------------------------------------------+Summary
Section titled “Summary”Key scalability concepts:
- Know your bottleneck - Identify what’s limiting you (CPU, memory, I/O)
- Choose the right scaling strategy - Vertical for simplicity, horizontal for scale
- Design stateless - Makes horizontal scaling easy
- Use caching - Reduces database load
- Plan for peaks - Don’t design for average case only
- Monitor everything - Know when to scale