Design_spotify
Chapter 50: Designing Spotify
Section titled βChapter 50: Designing SpotifyβMusic Streaming with Real-Time Personalization
Section titled βMusic Streaming with Real-Time Personalizationβ50.1 Spotify Overview
Section titled β50.1 Spotify OverviewβSpotify is the worldβs most popular music streaming service with over 500 million users and 100+ million tracks.
Spotify by the Numbers ====================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 500M+ monthly active users β β 200M+ subscribers (paid) β β 100M+ tracks β β 4B+ playlists β β 100K+ new tracks added daily β β 2B+ hours streamed monthly β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββRequirements Analysis
Section titled βRequirements Analysisβ| Requirement | Scale | Technical Challenge |
|---|---|---|
| Streaming | Sub-200ms latency | Global CDN |
| Music catalog | 100M+ tracks | Metadata management |
| Recommendations | Real-time personalization | ML at scale |
| Availability | 99.99% | Global infrastructure |
| Uploads | 100K/day | Ingestion pipeline |
50.2 High-Level Architecture
Section titled β50.2 High-Level Architectureβ Spotify Architecture =================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Mobile Apps β β (iOS, Android, Desktop) β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β API Gateway β β (Edge, Authentication) β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Backend Services β β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β Playbackβ βMetadata β βPlaylist β β Search β β β β Service β β Service β β Service β β Service β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β User β βLibrary β β Social β βUpload β β β β Profile β β Service β β Service β β Service β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ β β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Recommendation Services (Secret Sauce!) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Data Layer β β β β ββββββββββββ ββββββββββββ ββββββββββββ β β βCassandra β β PostgreSQLβ β Redis β β β β(Metadata)β β (User/Pay)β β(Sessions)β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β β ββββββββββββ ββββββββββββ ββββββββββββ β β β S3 β β Kafka β β Google β β β β(Audio) β β (Events) β β BigQuery β β β ββββββββββββ ββββββββββββ ββββββββββββ β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ50.3 Music Storage & Delivery
Section titled β50.3 Music Storage & DeliveryβAudio File Storage
Section titled βAudio File Storageβ Spotify's Audio Pipeline =====================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Upload Phase β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Labels/Artists βββΆ Upload to S3 βββΆ Trigger processing β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Processing Pipeline (Several hours) β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β 1. Convert to Spotify format (OGG Vorbis) β β 2. Generate multiple quality levels β β 3. Create audio fingerprints β β 4. Analyze audio (BPM, key, energy) β β 5. Store in blob storage β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Storage & CDN β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Stored in Google Cloud Storage β β Distributed via CDN (Google Cloud CDN) β β β β Multiple quality levels: β β β’ 24kbps (mobile, low bandwidth) β β β’ 96kbps (mobile, standard) β β β’ 160kbps (desktop, high) β β β’ 320kbps (premium, highest) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββStreaming Protocol
Section titled βStreaming Protocolβ Spotify Streaming Protocol ======================
Instead of HTTP streaming, Spotify uses a custom protocol:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Why Custom Protocol? β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β’ Lower latency than HTTP β β β’ Better buffering control β β β’ Optimized for frequent seeking β β β’ Efficient for short playback sessions β β β’ Pirate-proof (encrypted content) β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Flow: βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Client requests audio chunk 2. Server streams encrypted audio 3. Client decrypts and plays 4. Buffer next chunks ahead
Advantages: β’ ~200ms startup time β’ Seamless track transitions β’ Efficient seeking50.4 Event-Driven Architecture
Section titled β50.4 Event-Driven ArchitectureβSpotify processes billions of events daily using Kafka.
Spotify Event Infrastructure ==========================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Event Types β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β’ Playback events (song played, paused, skipped) β β β’ Search queries β β β’ Playlist modifications β β β’ Social interactions β β β’ Library changes β β β’ Errors and diagnostics β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Event Pipeline β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Apps βββΆ Kafka βββΆ Consumers β β β β β ββββΆ Spark Streaming (real-time) β β β β β ββββΆ Data Warehouse (batch) β β β β β ββββΆ Recommendation Models β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββKafka at Scale
Section titled βKafka at Scaleβ Spotify's Kafka Cluster ======================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Scale: β β β’ 100+ Kafka brokers β β β’ Trillions of messages per day β β β’ Petabytes of data β β β’ Millions of events per second at peak β β β β Topics: β β β’ user-identity-events β β β’ playback-events β β β’ track-played-events β β β’ search-events β β β’ recommendation-events β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ50.5 Recommendation System
Section titled β50.5 Recommendation SystemβSpotifyβs recommendation system is legendary, especially Discover Weekly.
Spotify Recommendation Pipeline ============================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Data Collection (Real-time) β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β User Actions: β β β’ What they listen to (complete vs skip) β β β’ What they add to playlists β β β’ What they search for β β β’ What they like/heart β β β’ Time of day they listen β β β’ Social connections β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Batch Processing (Offline) β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β’ Collaborative filtering β β β’ Audio analysis (the "audio" model) β β β’ Embeddings for all tracks β β β’ User clustering β β β β Using: Apache Spark, Python, TensorFlow β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Real-time Processing β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β β’ Update recommendations in real-time β β β’ "Because you played X" suggestions β β β’ "Made For You" personalized playlists β β β β Using: Kafka Streams, Redis β β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββRecommendation Models
Section titled βRecommendation Modelsβ Spotify's Recommendation Algorithms =================================
1. COLLABORATIVE FILTERING βββββββββββββββββββββββ "Users with similar taste liked these tracks"
Matrix factorization on user-track interactions
2. AUDIO ANALYSIS ββββββββββββββββ "This track sounds similar to tracks you like"
β’ BPM, danceability, energy β’ Key, tempo β’ Instrumentalness β’ Audio embeddings
3. NATURAL LANGUAGE PROCESSING ββββββββββββββββββββββββββ "Tracks described with similar words"
Scraped from music blogs, reviews
4. CONVOLUTIONAL NEURAL NETWORKS ββββββββββββββββββββββββββββββ Direct audio analysis
Raw audio β CNN β Embeddings
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Discover Weekly: ββββββββββββββββ β’ 30 songs updated every Monday β’ Mix of: - Songs from similar users - Songs with similar audio - New releases from followed artists50.6 Microservices at Spotify
Section titled β50.6 Microservices at Spotifyβ Spotify's Microservices =====================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β ~1,000 microservices in production! β β β β Each team owns: β β β’ Own service (end-to-end) β β β’ Own data β β β’ Own deployment β β β’ On-call rotation β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Services: βββββββββββββ β’ metadata-service (track, artist info) β’ playback-service (streaming control) β’ recommendation-service β’ playlist-service β’ search-service β’ user-service β’ social-service β’ billing-serviceBackend for Frontend (BFF)
Section titled βBackend for Frontend (BFF)β BFF Pattern at Spotify ====================
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Mobile App β ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Mobile BFF β β (Dedicated for mobile) β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β Aggregates: β β β’ User profile β β β’ Playlist data β β β’ Recommendations β β β’ Recently played β β β β Returns: Single optimized response β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β βΌ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β Core Microservices β β β β β’ user-service β β β’ playlist-service β β β’ recommendation-service β β β’ ... β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Benefits: βββββββββ β’ Mobile-optimized responses β’ Reduced round trips β’ Independent scaling50.7 Key Learnings from Spotify
Section titled β50.7 Key Learnings from Spotifyβ Spotify Engineering Principles ============================
1. EVENT-DRIVEN βββββββββββββββ β’ Kafka for everything β’ Decoupled services β’ Real-time + batch processing β’ Complete audit trail
2. MICROSERVICES βββββββββββββββ β’ ~1,000 independent services β’ Autonomous teams β’ Own data, own deployment
3. GREMLIN CHAOS ENGINEERING βββββββββββββββββββββββββββ β’ Inspired by Netflix β’ Regular chaos experiments β’ Build confidence in resilience
4. RECOMMENDATIONS FIRST βββββββββββββββββββββ β’ ML-driven experience β’ Multiple algorithms combined β’ Real-time personalization
5. DEVELOPER EXPERIENCE ββββββββββββββββββββ β’ Internal tooling β’ Self-service platforms β’ Fast deploysSummary
Section titled βSummaryβ- Music streaming - Custom protocol, low latency
- Event-driven - Kafka for billions of events
- Microservices - ~1,000 services
- Recommendations - Multi-model ML pipeline
- Metadata - Cassandra for catalog
- CDN - Google Cloud CDN for audio delivery
Congratulations!
Section titled βCongratulations!βYouβve completed the System Design Guide!
This guide covered:
- Fundamentals: Scalability, load balancing, caching
- Database Design: SQL vs NoSQL, CAP theorem, replication, sharding
- Architecture Patterns: Monolith, microservices, event-driven, CQRS, serverless
- API Design: REST, GraphQL, authentication, message queues
- Reliability: Circuit breakers, rate limiting, retries, timeouts
- Observability: Logging, monitoring, alerting, distributed tracing
- Security: TLS, OAuth/JWT, secrets management, DDoS protection
- Real-world Case Studies: Twitter, Netflix, Uber, Amazon, Spotify
Youβre now equipped to design large-scale distributed systems!
Keep learning and building! π