Skip to content

Rate_limiting


Rate Limiting restricts the number of requests a user/client can make in a given time period.

Rate Limiting Concept
====================
Without Rate Limit: With Rate Limit:
Requests: 10000/s Requests: 100/s
Server: Crashes! Server: OK

ReasonDescription
Prevent abuseStop bots, scrapers
Protect resourcesDon’t overwhelm services
Cost controlPrevent bill spikes
Fair usageAll users get fair share
SecurityDDoS protection

Token Bucket
=============
Bucket has tokens at rate R
Each request consumes 1 token
If bucket empty -> reject
+----------+
| Tokens: |-> Request -> 1 token used
| 10/s | (if available)
+----------+
Burst: Can use accumulated tokens
Leaky Bucket
===========
Requests enter bucket
Processed at fixed rate
+----------+
| Incoming | +----------+
| Requests | -> | Process | -> Out
+----------+ | 1/s |
+----------+
Fixed Window
=============
Count requests per time window
Reset at window boundary
100 requests/minute:
- 0-60s: 100 requests OK
- 60-120s: 100 requests OK
- Boundary spike can double!
Sliding Window
==============
Rolling time window
More accurate
Now = 10:30:15
Count requests from 9:30:15 to 10:30:15
(Rolling 1 minute window)

Rate Limiting Layers
===================
API Gateway (Most common)
|
Load Balancer
|
Application
|
Database (last resort)

Rate Limited Response
====================
HTTP 429 Too Many Requests
Headers:
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1700000000

PracticeDescription
Graceful handlingReturn 429, not 500
User feedbackShow limit info
Multiple limitsPer IP, per user, per endpoint
WhitelistDon’t limit internal services
Log and monitorTrack who hits limits

  1. Prevent abuse - Stop excessive requests
  2. Algorithms - Token bucket, leaky bucket
  3. 429 status - Standard rate limit response
  4. Multiple layers - Gateway to database
  5. Monitor - Track and analyze patterns