Skip to content

Scalability


Scalability is the capability of a system to handle a growing amount of work by adding resources. It’s not just about handling more users—it’s about handling growth efficiently.

Scalability vs Performance
==========================
Performance: How FAST the system processes requests
Scalability: How MUCH the system can handle
+------------------+ +------------------+
| Performance | | Scalability |
| | | |
| - Low latency | | - Handle more |
| - High throughput| | users |
| - Fast response | | - Grow with |
| | | demand |
+------------------+ +------------------+

Adding more power to your existing machine:

Vertical Scaling
================
Before After
+-------------+ +-------------+
| Server | | Server |
| 2 CPU | ===> | 32 CPU |
| 4GB RAM | | 128GB RAM |
| 100GB SSD | | 2TB SSD |
+-------------+ +-------------+
Pros:
+ Simple to implement
+ No code changes needed
+ Single point of control
Cons:
+ Hardware limits
+ Downtime for upgrades
+ Single point of failure
+ Expensive at high end

Adding more machines to handle load:

Horizontal Scaling
==================
Load Balancer
|
+-------------+-------------+-------------+
| | | |
v v v v
+----------+ +----------+ +----------+ +----------+
| Server 1 | | Server 2 | | Server 3 | | Server N |
| (2 CPU) | | (2 CPU) | | (2 CPU) | | (2 CPU) |
+----------+ +----------+ +----------+ +----------+
Pros:
+ Linear scalability
+ No hardware limits
+ Fault tolerance
+ Cost-effective
Cons:
+ Complex architecture
+ Data consistency challenges
+ Network overhead
+ More complex debugging

Scaling Cube
============
X-Axis: Cloning
(Multiple identical copies)
|
| Z-Axis: Sharding
| (Data partitioning)
| |
| |
+-------------+-------------+-------------+---->
|
|
Y-Axis: Functional
decomposition
(Split by function)
X-Axis: Cloning
===========
+----+ +----+ +----+ +----+
| S1 | | S2 | | S3 | | S4 | All handle same function
+----+ +----+ +----+ +----+
Y-Axis: Functional Decomposition
===============================
+-------+ +-------+ +-------+
| User | | Order | |Payment| Different functions
| Service| | Service| |Service|
+-------+ +-------+ +-------+
Z-Axis: Sharding
===============
+-------+ +-------+
| Users | | Users |
| A-M | | N-Z | Same function, different data
+-------+ +-------+
Database Scaling Strategies
==========================
1. Vertical Scaling
+-------------+
| Primary |
| Database |
| (Powerful |
| server) |
+-------------+
2. Read Replicas
+-------------+ +-------------+ +-------------+
| Primary | ---->| Replica | ---> | Replica |
| Database | | (Read) | | (Read) |
+-------------+ +-------------+ +-------------+
3. Sharding (Horizontal Partitioning)
+-------------+ +-------------+
| Shard 1 | | Shard 2 |
| Users A-M | | Users N-Z |
+-------------+ +-------------+

Compute-Bound Scaling
=====================
Characteristic: CPU is the bottleneck
Solution: Add more compute instances
+------------+ +------------+ +------------+
| Instance | | Instance | | Instance |
| (CPU 100%)| | (CPU 100%)| | (CPU 100%)|
+------------+ +------------+ +------------+
Use Cases:
- Image/video processing
- Machine learning inference
- Complex calculations
- Video encoding
Memory-Bound Scaling
===================
Characteristic: RAM is the bottleneck
Solution: Add more instances with more RAM or use caching
+------------+ +-----------+
| Instance | | Redis |
| | | Cache |
| Memory | | |
| (95% used) | | Hot data |
+------------+ +-----------+
Use Cases:
- In-memory databases
- Real-time analytics
- Session storage
- Caching layers
I/O-Bound Scaling
=================
Characteristic: Disk or Network is bottleneck
Solution: Use caching, CDNs, async processing
+--------+ +--------+ +--------+
| Request| | Request| | Request|
+--------+ +--------+ +--------+
| | |
v v v
+----------------------------------------+
| Caching Layer |
| (Serve from memory, not disk) |
+----------------------------------------+
Use Cases:
- File storage systems
- Database operations
- Video streaming
- Large data transfers

Stateless Architecture
======================
Request 1 Request 2 Request 3
| | |
v v v
+----------------------------------------------+
| Load Balancer |
+----------------------------------------------+
| | |
v v v
+----------+ +----------+ +----------+
| Server A | | Server B | | Server C |
| State: | | State: | | State: |
| None | | None | | None |
+----------+ +----------+ +----------+
| | |
v v v
+----------------------------------------------+
| External State Store |
| (Redis, Database, S3, etc.) |
+----------------------------------------------+
Why Stateless?
- Any server can handle any request
- Easy to add/remove servers
- Simple load balancing
- Better fault tolerance
Session Management Options
==========================
1. Sticky Sessions (Not recommended)
+----------+ +----------+
|User A | --> | Server 1 |
|Session | | Session |
+----------+ +----------+
Problem: Server failure = session loss
2. Session Store (Recommended)
+----------+ +----------+ +----------+
| User A | --> | Any | --> | Redis |
| Session | | Server | | Session |
+----------+ +----------+ +----------+
Benefit: Any server can handle the request
Partitioning Strategies
======================
1. Horizontal Partitioning (Sharding)
+------------------------+
| All Users |
+------------------------+
|
v
+----------+ +----------+
| A-M | | N-Z |
| Shard | | Shard |
+----------+ +----------+
2. Vertical Partitioning
+------------------------+
| All Data |
+------------------------+
|
v
+----------+----------+----------+
| Profile | Orders | Analytics|
| (Table1) | (Table2)| (Table3) |
+----------+----------+----------+
3. Functional Partitioning
+------------------------+
| All Services |
+------------------------+
|
v
+-------+-------+-------+
|User |Order |Payment|
|Service|Service|Service|
+-------+-------+-------+

Autoscaling Architecture
=========================
+----------------------------------------------------------+
| Cloud Provider |
+----------------------------------------------------------+
| |
| +----------------+ +---------------------------+ |
| | Auto Scaling | | Load Balancer | |
| | Group | | | |
| +----------------+ +---------------------------+ |
| | | |
| v v |
| +-------------------------------------------------+ |
| | Scaling Policies | |
| | - Scale out when CPU > 70% | |
| | - Scale in when CPU < 30% | |
| | - Min instances: 2 | |
| | - Max instances: 20 | |
| +-------------------------------------------------+ |
| |
| +--------+ +--------+ +--------+ +--------+ |
| |Instance| |Instance| |Instance| |Instance| |
| | 1 | | 2 | | 3 | | 4 | |
| +--------+ +--------+ +--------+ +--------+ |
+----------------------------------------------------------+
MetricDescriptionUse Case
CPU UtilizationPercentage of CPU usedGeneral compute
Memory UtilizationRAM usageMemory-intensive
Request CountRequests per instanceWeb applications
Network I/OBytes in/outNetwork-bound
Custom MetricsBusiness-specificUnique workloads

Capacity Planning Process
=========================
Step 1: Determine Target Load
+---------------------------+
| - Expected users |
| - Requests per user |
| - Peak traffic times |
+---------------------------+
Step 2: Calculate Requirements
+---------------------------+
| - Requests per second |
| - Bandwidth needed |
| - Storage requirements |
+---------------------------+
Step 3: Apply Safety Factor
+---------------------------+
| - 2-3x for growth |
| - Buffer for spikes |
| - Consider peak seasons |
+---------------------------+
Example Calculation
===================
Given:
- 1,000,000 monthly active users
- Each user makes 10 requests per day
- Average request takes 100ms
Calculate:
- Daily requests: 10,000,000
- Requests per second: 10,000,000 / 86,400 ≈ 116 RPS
- Peak (3x): ~350 RPS
- With safety factor (3x): ~1000 RPS capacity needed

Amazon's Scaling Journey
========================
1995: Single Server
+----------+
| Website | Everything on one server
+----------+
1998: Three-Tier
+----------+ +----------+ +----------+
| Web |--> | App |--> | DB |
| Server | | Server | | Server |
+----------+ +----------+ +----------+
2000s: Distributed
+--------------------------------------------------+
| CDN -> Load Balancer -> Services -> Databases |
+--------------------------------------------------+
Today: Global Scale
+--------------------------------------------------+
| Multiple Regions -> Edge Locations -> Services |
| -> Microservices -> Polyglot Persistence |
+--------------------------------------------------+

Key scalability concepts:

  1. Know your bottleneck - Identify what’s limiting you (CPU, memory, I/O)
  2. Choose the right scaling strategy - Vertical for simplicity, horizontal for scale
  3. Design stateless - Makes horizontal scaling easy
  4. Use caching - Reduces database load
  5. Plan for peaks - Don’t design for average case only
  6. Monitor everything - Know when to scale

Next: Chapter 4: Load Balancing