Scalability

Chapter 3: Scalability Fundamentals

Building Systems That Grow

3.1 Understanding Scalability

Scalability is the capability of a system to handle a growing amount of work by adding resources. It’s not just about handling more users—it’s about handling growth efficiently.

    Scalability vs Performance
    ==========================

    Performance: How FAST the system processes requests
    Scalability: How MUCH the system can handle

    +------------------+     +------------------+
    |  Performance     |     |  Scalability     |
    |                  |     |                  |
    | - Low latency    |     | - Handle more    |
    | - High throughput|    |   users          |
    | - Fast response |     | - Grow with      |
    |                  |     |   demand         |
    +------------------+     +------------------+

3.2 Types of Scaling

3.2.1 Vertical Scaling (Scale Up)

Adding more power to your existing machine:

    Vertical Scaling
    ================

    Before                    After
    +-------------+          +-------------+
    |   Server    |          |   Server    |
    |  2 CPU      |   ===>   |  32 CPU     |
    |  4GB RAM    |          |  128GB RAM  |
    |  100GB SSD  |          |  2TB SSD    |
    +-------------+          +-------------+

    Pros:
    + Simple to implement
    + No code changes needed
    + Single point of control

    Cons:
    + Hardware limits
    + Downtime for upgrades
    + Single point of failure
    + Expensive at high end

3.2.2 Horizontal Scaling (Scale Out)

Adding more machines to handle load:

    Horizontal Scaling
    ==================

                    Load Balancer
                        |
         +-------------+-------------+-------------+
         |             |             |             |
         v             v             v             v
    +----------+   +----------+   +----------+   +----------+
    | Server 1 |   | Server 2 |   | Server 3 |   | Server N |
    | (2 CPU)  |   | (2 CPU)  |   | (2 CPU)  |   | (2 CPU)  |
    +----------+   +----------+   +----------+   +----------+

    Pros:
    + Linear scalability
    + No hardware limits
    + Fault tolerance
    + Cost-effective

    Cons:
    + Complex architecture
    + Data consistency challenges
    + Network overhead
    + More complex debugging

3.3 Scaling Dimensions

The Three Dimensions

    Scaling Cube
    ============

           X-Axis: Cloning
           (Multiple identical copies)
                  |
                  |      Z-Axis: Sharding
                  |      (Data partitioning)
                  |            |
                  |            |
    +-------------+-------------+-------------+---->
                  |
                  |
           Y-Axis: Functional
           decomposition
           (Split by function)

    X-Axis: Cloning
    ===========
    +----+   +----+   +----+   +----+
    | S1 |   | S2 |   | S3 |   | S4 |  All handle same function
    +----+   +----+   +----+   +----+

    Y-Axis: Functional Decomposition
    ===============================
    +-------+  +-------+  +-------+
    | User  |  | Order |  |Payment|  Different functions
    | Service|  | Service|  |Service|
    +-------+  +-------+  +-------+

    Z-Axis: Sharding
    ===============
    +-------+  +-------+
    | Users |  | Users |
    |  A-M  |  | N-Z   |  Same function, different data
    +-------+  +-------+

3.3.1 Database Scaling

    Database Scaling Strategies
    ==========================

    1. Vertical Scaling
    +-------------+
    |   Primary   |
    |   Database  |
    |  (Powerful  |
    |    server)  |
    +-------------+

    2. Read Replicas
    +-------------+      +-------------+      +-------------+
    |   Primary   | ---->|   Replica   | ---> |   Replica   |
    |   Database  |      |   (Read)     |      |   (Read)    |
    +-------------+      +-------------+      +-------------+

    3. Sharding (Horizontal Partitioning)
    +-------------+      +-------------+
    | Shard 1    |      | Shard 2     |
    | Users A-M  |      | Users N-Z   |
    +-------------+      +-------------+

3.4 Load Types and Scaling

4.1 Compute-Bound Loads

    Compute-Bound Scaling
    =====================

    Characteristic: CPU is the bottleneck

    Solution: Add more compute instances
    +------------+   +------------+   +------------+
    |  Instance |   |  Instance |   |  Instance |
    | (CPU 100%)|   | (CPU 100%)|   | (CPU 100%)|
    +------------+   +------------+   +------------+

    Use Cases:
    - Image/video processing
    - Machine learning inference
    - Complex calculations
    - Video encoding

3.4.2 Memory-Bound Loads

    Memory-Bound Scaling
    ===================

    Characteristic: RAM is the bottleneck

    Solution: Add more instances with more RAM or use caching

    +------------+   +-----------+
    |  Instance  |   |   Redis   |
    |            |   |   Cache   |
    | Memory     |   |           |
    | (95% used) |   | Hot data  |
    +------------+   +-----------+

    Use Cases:
    - In-memory databases
    - Real-time analytics
    - Session storage
    - Caching layers

3.4.3 I/O-Bound Loads

    I/O-Bound Scaling
    =================

    Characteristic: Disk or Network is bottleneck

    Solution: Use caching, CDNs, async processing

    +--------+      +--------+      +--------+
    | Request|      | Request|      | Request|
    +--------+      +--------+      +--------+
        |              |              |
        v              v              v
    +----------------------------------------+
    |              Caching Layer              |
    |  (Serve from memory, not disk)        |
    +----------------------------------------+

    Use Cases:
    - File storage systems
    - Database operations
    - Video streaming
    - Large data transfers

3.5 Scalability Patterns

3.5.1 Stateless Services

    Stateless Architecture
    ======================

    Request 1          Request 2          Request 3
       |                  |                  |
       v                  v                  v
    +----------------------------------------------+
    |              Load Balancer                    |
    +----------------------------------------------+
       |                  |                  |
       v                  v                  v
    +----------+     +----------+     +----------+
    | Server A |     | Server B |     | Server C |
    | State:   |     | State:   |     | State:   |
    | None     |     | None     |     | None     |
    +----------+     +----------+     +----------+
       |                  |                  |
       v                  v                  v
    +----------------------------------------------+
    |              External State Store            |
    |         (Redis, Database, S3, etc.)          |
    +----------------------------------------------+

    Why Stateless?
    - Any server can handle any request
    - Easy to add/remove servers
    - Simple load balancing
    - Better fault tolerance

3.5.2 State Management

    Session Management Options
    ==========================

    1. Sticky Sessions (Not recommended)
    +----------+     +----------+
    |User A    | --> | Server 1 |
    |Session   |     | Session  |
    +----------+     +----------+

    Problem: Server failure = session loss

    2. Session Store (Recommended)
    +----------+     +----------+     +----------+
    | User A   | --> | Any      | --> | Redis    |
    | Session  |     | Server   |     | Session  |
    +----------+     +----------+     +----------+

    Benefit: Any server can handle the request

3.5.3 Data Partitioning

    Partitioning Strategies
    ======================

    1. Horizontal Partitioning (Sharding)
    +------------------------+
    |     All Users          |
    +------------------------+
           |
           v
    +----------+    +----------+
    | A-M      |    | N-Z      |
    | Shard    |    | Shard    |
    +----------+    +----------+

    2. Vertical Partitioning
    +------------------------+
    |     All Data          |
    +------------------------+
           |
           v
    +----------+----------+----------+
    | Profile  | Orders  | Analytics|
    | (Table1) | (Table2)| (Table3) |
    +----------+----------+----------+

    3. Functional Partitioning
    +------------------------+
    |     All Services      |
    +------------------------+
           |
           v
    +-------+-------+-------+
    |User   |Order  |Payment|
    |Service|Service|Service|
    +-------+-------+-------+

3.6 Autoscaling

How Autoscaling Works

    Autoscaling Architecture
    =========================

    +----------------------------------------------------------+
    |                    Cloud Provider                        |
    +----------------------------------------------------------+
    |                                                           |
    |  +----------------+     +---------------------------+   |
    |  | Auto Scaling  |     |    Load Balancer          |   |
    |  | Group         |     |                           |   |
    |  +----------------+     +---------------------------+   |
    |          |                        |                    |
    |          v                        v                    |
    |  +-------------------------------------------------+   |
    |  |              Scaling Policies                   |   |
    |  |  - Scale out when CPU > 70%                     |   |
    |  |  - Scale in when CPU < 30%                       |   |
    |  |  - Min instances: 2                              |   |
    |  |  - Max instances: 20                             |   |
    |  +-------------------------------------------------+   |
    |                                                           |
    |  +--------+  +--------+  +--------+  +--------+          |
    |  |Instance|  |Instance|  |Instance|  |Instance|          |
    |  |   1   |  |   2   |  |   3   |  |   4   |          |
    |  +--------+  +--------+  +--------+  +--------+          |
    +----------------------------------------------------------+

Metrics for Autoscaling

Metric	Description	Use Case
CPU Utilization	Percentage of CPU used	General compute
Memory Utilization	RAM usage	Memory-intensive
Request Count	Requests per instance	Web applications
Network I/O	Bytes in/out	Network-bound
Custom Metrics	Business-specific	Unique workloads

3.7 Capacity Planning

Estimating Capacity

    Capacity Planning Process
    =========================

    Step 1: Determine Target Load
    +---------------------------+
    | - Expected users          |
    | - Requests per user       |
    | - Peak traffic times      |
    +---------------------------+

    Step 2: Calculate Requirements
    +---------------------------+
    | - Requests per second     |
    | - Bandwidth needed        |
    | - Storage requirements    |
    +---------------------------+

    Step 3: Apply Safety Factor
    +---------------------------+
    | - 2-3x for growth        |
    | - Buffer for spikes       |
    | - Consider peak seasons   |
    +---------------------------+

    Example Calculation
    ===================

    Given:
    - 1,000,000 monthly active users
    - Each user makes 10 requests per day
    - Average request takes 100ms

    Calculate:
    - Daily requests: 10,000,000
    - Requests per second: 10,000,000 / 86,400 ≈ 116 RPS
    - Peak (3x): ~350 RPS
    - With safety factor (3x): ~1000 RPS capacity needed

3.8 Real-World Examples

Scaling Example: Amazon

    Amazon's Scaling Journey
    ========================

    1995: Single Server
    +----------+
    |  Website |  Everything on one server
    +----------+

    1998: Three-Tier
    +----------+   +----------+   +----------+
    |   Web   |--> |  App     |--> |   DB     |
    |  Server |   |  Server  |   |  Server  |
    +----------+   +----------+   +----------+

    2000s: Distributed
    +--------------------------------------------------+
    |  CDN -> Load Balancer -> Services -> Databases  |
    +--------------------------------------------------+

    Today: Global Scale
    +--------------------------------------------------+
    | Multiple Regions -> Edge Locations -> Services  |
    | -> Microservices -> Polyglot Persistence        |
    +--------------------------------------------------+

Summary

Key scalability concepts:

Know your bottleneck - Identify what’s limiting you (CPU, memory, I/O)
Choose the right scaling strategy - Vertical for simplicity, horizontal for scale
Design stateless - Makes horizontal scaling easy
Use caching - Reduces database load
Plan for peaks - Don’t design for average case only
Monitor everything - Know when to scale

Next: Chapter 4: Load Balancing