Design_twitter
Chapter 46: Designing Twitter/X
Section titled “Chapter 46: Designing Twitter/X”System Design Case Study
Section titled “System Design Case Study”46.1 Requirements Clarification
Section titled “46.1 Requirements Clarification”Functional Requirements
Section titled “Functional Requirements” Twitter Core Features =====================
1. Tweet - Post tweet (280 chars) - View timeline - Media support (images, videos)
2. Follow System - Follow/unfollow users - View followers/following
3. Timeline - Home timeline - User timeline
4. Social - Likes, retweets, replies - Mentions - HashtagsNon-Functional Requirements
Section titled “Non-Functional Requirements”| Requirement | Target |
|---|---|
| Availability | 99.99% |
| Latency | < 200ms for timeline |
| Scalability | 500M+ users |
| Consistency | Eventual for timeline |
46.2 High-Level Architecture
Section titled “46.2 High-Level Architecture” Twitter Architecture ====================
Internet | v +--------------------------------------------------+ | CDN (Edge Locations) | +--------------------------------------------------+ | v +--------------------------------------------------+ | Load Balancers | +--------------------------------------------------+ | +-------------------+-------------------+ | | | v v v +---------+ +---------+ +---------+ | Web API | | Mobile | |Internal | | Servers | | API | | Services| +---------+ +---------+ +---------+ | | +-------------------+ | v +--------------------------------------------------+ | Service Mesh / API Gateway | +--------------------------------------------------+ | +----------+ +----------+ +----------+ +----------+ | Tweet | | User | | Timeline | | Social | | Service | | Service | | Service | | Service | +----------+ +----------+ +----------+ +----------+ | v +--------------------------------------------------+ | Message Queue (Kafka) | +--------------------------------------------------+ | +-----------+-----------+ | | | v v v +----------+ +----------+ +----------+ | User | | Tweet | | Social | | DB | | DB | | DB | +----------+ +----------+ +----------+46.3 Data Models
Section titled “46.3 Data Models”Core Entities
Section titled “Core Entities” User Entity ==========
{ "userId": "uuid", "username": "john", "displayName": "John Doe", "email": "john@example.com", "createdAt": "2024-01-01T00:00:00Z", "followersCount": 1000, "followingCount": 500 }
Tweet Entity ============
{ "tweetId": "uuid", "userId": "uuid", "content": "Hello world!", "mediaUrls": ["url1", "url2"], "replyToTweetId": "uuid (optional)", "retweetOfTweetId": "uuid (optional)", "likesCount": 100, "retweetsCount": 50, "repliesCount": 10, "createdAt": "2024-01-01T12:00:00Z" }
Follow Entity =============
{ "followerId": "uuid", "followingId": "uuid", "createdAt": "2024-01-01T12:00:00Z" }46.4 Database Selection
Section titled “46.4 Database Selection”Storage Strategy
Section titled “Storage Strategy”| Data Type | Database | Reason |
|---|---|---|
| Users | MySQL (Sharded) | ACID for relationships |
| Tweets | Cassandra | High write throughput |
| Timelines | Redis | Fast read |
| Follows | Redis + MySQL | Fast lookups |
| Media | S3 + CloudFront | Blob storage |
46.5 Timeline Generation
Section titled “46.5 Timeline Generation”Fan-out Approach
Section titled “Fan-out Approach” Timeline Generation (Fan-out) ============================
When user posts tweet:
1. Write to tweet DB (Cassandra) 2. Get user's followers (Redis) 3. Fan-out to each follower's timeline cache
Timeline Read:
1. User requests timeline 2. Read from timeline cache (Redis) 3. If cache miss, merge from tweet DB
Pros: Fast reads Cons: Slow writes for popular users
Hybrid Approach: ===============
Active users (< 1M followers): - Fan-out to timeline cache
Popular users (> 1M followers): - Don't fan-out - Generate on read (pull model)46.6 Scalability Challenges
Section titled “46.6 Scalability Challenges”High Read/Write Volume
Section titled “High Read/Write Volume” Twitter Scale =============
- 200M+ daily active users - ~5000 tweets/second average - ~100K tweets/second peak - Timeline requests: millions/second
Solutions: ==========
1. Read Replicas +--------------------------------+ | Primary DB -> Read Replicas | | Distribute read load | +--------------------------------+
2. Caching (Redis) +--------------------------------+ | Timeline cache | | User cache | | Tweet cache | +--------------------------------+
3. Sharding +--------------------------------+ | User-based sharding | | Tweet ID-based sharding | +--------------------------------+46.7 Search & Discovery
Section titled “46.7 Search & Discovery”Search Architecture
Section titled “Search Architecture” Search Architecture ==================
User Query -> API Gateway -> Search Service | +--------------------+--------------------+ | | | v v v +----------+ +----------+ +---------- |Elastic | | Redis | | Search |search | | Cache | | Ranking |Cluster | | | | +----------+ +----------+ +----------
Features: - Full-text search - Filters (hashtags, users) - Trending topics - Ranking algorithm46.8 System Design Diagram
Section titled “46.8 System Design Diagram” Complete Twitter Architecture ============================
+---------------------------------------------------------------+ | Internet | +---------------------------------------------------------------+ | v +---------------------------------------------------------------+ | CDN (CloudFront) | | (Static assets: JS, CSS, Images) | +---------------------------------------------------------------+ | v +---------------------------------------------------------------+ | Load Balancers (ALB) | +---------------------------------------------------------------+ | +-------------+-------------+ | | v v +---------------+ +---------------+ | Web App | | Mobile App | | Servers | | Gateway | +---------------+ +---------------+ | | +-------------+-------------+ | v +---------------------------------------------------------------+ | API Gateway (Kong) | | (Auth, Rate limiting, Routing) | +---------------------------------------------------------------+ | +-------------+-------------+-------------+-------------+ | | | | | v v v v v +---------+ +---------+ +---------+ +---------+ +---------+ | Tweet | | User | |Timeline | | Search | | Notif | | Service | | Service | | Service| | Service | |Service | +---------+ +---------+ +---------+ +---------+ +---------+ | | | | +-------------+-------------+-------------+ | v +---------------------------------------------------------------+ | Apache Kafka (Message Queue) | +---------------------------------------------------------------+ | +-------------+-------------+-------------+ | | | | v v v v +---------+ +---------+ +---------+ +---------+ |Cassandra| | MySQL | | Redis | |Elastic | | (Tweets)| | (Users) | | (Cache) | |search | +---------+ +---------+ +---------+ +---------+46.9 Key Takeaways
Section titled “46.9 Key Takeaways”Design Decisions Summary
Section titled “Design Decisions Summary”| Decision | Rationale |
|---|---|
| Cassandra for tweets | High write throughput |
| Redis for timeline | Fast reads |
| Fan-out on write | Fast timeline reads |
| Hybrid for popular users | Avoid overwhelming |
| Eventual consistency | Acceptable for timeline |
| S3 + CDN | Scalable media storage |
Summary
Section titled “Summary”Key Twitter design concepts:
- Fan-out strategy - Write-heavy vs read-heavy
- Database per feature - Different data, different DBs
- Caching everywhere - Redis for timelines
- Eventual consistency - Acceptable for social
- Media offload - S3 + CDN
- Search integration - Elasticsearch