Rate_limiting

Chapter 22: Rate Limiting & Throttling

Protecting Your System from Overload

22.1 What is Rate Limiting?

Rate Limiting restricts the number of requests a user/client can make in a given time period.

    Rate Limiting Concept
    ====================

    Without Rate Limit:        With Rate Limit:
    Requests: 10000/s         Requests: 100/s
    Server: Crashes!          Server: OK

22.2 Why Rate Limiting?

Reason	Description
Prevent abuse	Stop bots, scrapers
Protect resources	Don’t overwhelm services
Cost control	Prevent bill spikes
Fair usage	All users get fair share
Security	DDoS protection

22.3 Rate Limiting Algorithms

Token Bucket

    Token Bucket
    =============

    Bucket has tokens at rate R
    Each request consumes 1 token
    If bucket empty -> reject

    +----------+
    | Tokens: |-> Request -> 1 token used
    | 10/s    |   (if available)
    +----------+

    Burst: Can use accumulated tokens

Leaky Bucket

    Leaky Bucket
    ===========

    Requests enter bucket
    Processed at fixed rate

    +----------+
    | Incoming |    +----------+
    | Requests | -> | Process  | -> Out
    +----------+    | 1/s     |
                    +----------+

Fixed Window

    Fixed Window
    =============

    Count requests per time window
    Reset at window boundary

    100 requests/minute:
    - 0-60s: 100 requests OK
    - 60-120s: 100 requests OK
    - Boundary spike can double!

Sliding Window

    Sliding Window
    ==============

    Rolling time window
    More accurate

    Now = 10:30:15
    Count requests from 9:30:15 to 10:30:15
    (Rolling 1 minute window)

224 Where to Implement

    Rate Limiting Layers
    ===================

    API Gateway           (Most common)
         |
    Load Balancer
         |
    Application
         |
    Database (last resort)

22.5 HTTP Status Codes

    Rate Limited Response
    ====================

    HTTP 429 Too Many Requests

    Headers:
    Retry-After: 60
    X-RateLimit-Limit: 100
    X-RateLimit-Remaining: 0
    X-RateLimit-Reset: 1700000000

22.6 Best Practices

Practice	Description
Graceful handling	Return 429, not 500
User feedback	Show limit info
Multiple limits	Per IP, per user, per endpoint
Whitelist	Don’t limit internal services
Log and monitor	Track who hits limits

Summary

Prevent abuse - Stop excessive requests
Algorithms - Token bucket, leaky bucket
429 status - Standard rate limit response
Multiple layers - Gateway to database
Monitor - Track and analyze patterns