Design_spotify

Chapter 50: Designing Spotify

Music Streaming with Real-Time Personalization

50.1 Spotify Overview

Spotify is the world’s most popular music streaming service with over 500 million users and 100+ million tracks.

    Spotify by the Numbers
    ====================

    ┌─────────────────────────────────────────────────────────────┐
    │  500M+ monthly active users                              │
    │  200M+ subscribers (paid)                                │
    │  100M+ tracks                                            │
    │  4B+ playlists                                          │
    │  100K+ new tracks added daily                            │
    │  2B+ hours streamed monthly                              │
    └─────────────────────────────────────────────────────────────┘

Requirements Analysis

Requirement	Scale	Technical Challenge
Streaming	Sub-200ms latency	Global CDN
Music catalog	100M+ tracks	Metadata management
Recommendations	Real-time personalization	ML at scale
Availability	99.99%	Global infrastructure
Uploads	100K/day	Ingestion pipeline

50.2 High-Level Architecture

    Spotify Architecture
    =================

    ┌─────────────────────────────────────────────────────────────┐
    │                     Mobile Apps                             │
    │              (iOS, Android, Desktop)                      │
    └────────────────────────────┬────────────────────────────────┘
                                 │
                                 ▼
    ┌─────────────────────────────────────────────────────────────┐
    │                     API Gateway                             │
    │                  (Edge, Authentication)                     │
    └────────────────────────────┬────────────────────────────────┘
                                 │
                                 ▼
    ┌─────────────────────────────────────────────────────────────┐
    │                   Backend Services                          │
    │                                                              │
    │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
    │  │ Playback│  │Metadata │  │Playlist │  │ Search  │  │
    │  │ Service │  │ Service │  │ Service │  │ Service │  │
    │  └─────────┘  └─────────┘  └─────────┘  └─────────┘  │
    │                                                              │
    │  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐  │
    │  │  User   │  │Library  │  │ Social  │  │Upload   │  │
    │  │ Profile │  │ Service │  │ Service │  │ Service │  │
    │  └─────────┘  └─────────┘  └─────────┘  └─────────┘  │
    │                                                              │
    │  ┌─────────────────────────────────────────────────────┐   │
    │  │      Recommendation Services (Secret Sauce!)      │   │
    │  └─────────────────────────────────────────────────────┘   │
    └────────────────────────────┬────────────────────────────────┘
                                 │
                                 ▼
    ┌─────────────────────────────────────────────────────────────┐
    │                    Data Layer                               │
    │                                                              │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
    │  │Cassandra │  │ PostgreSQL│  │  Redis   │             │
    │  │(Metadata)│  │ (User/Pay)│  │(Sessions)│             │
    │  └──────────┘  └──────────┘  └──────────┘             │
    │                                                              │
    │  ┌──────────┐  ┌──────────┐  ┌──────────┐             │
    │  │   S3     │  │  Kafka   │  │  Google  │             │
    │  │(Audio)   │  │ (Events) │  │ BigQuery │             │
    │  └──────────┘  └──────────┘  └──────────┘             │
    └─────────────────────────────────────────────────────────────┘

50.3 Music Storage & Delivery

Audio File Storage

    Spotify's Audio Pipeline
    =====================

    ┌─────────────────────────────────────────────────────────────┐
    │  Upload Phase                                              │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  Labels/Artists ──▶ Upload to S3 ──▶ Trigger processing │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────────┐
    │  Processing Pipeline (Several hours)                       │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  1. Convert to Spotify format (OGG Vorbis)               │
    │  2. Generate multiple quality levels                       │
    │  3. Create audio fingerprints                             │
    │  4. Analyze audio (BPM, key, energy)                     │
    │  5. Store in blob storage                                │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────────┐
    │  Storage & CDN                                             │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  Stored in Google Cloud Storage                            │
    │  Distributed via CDN (Google Cloud CDN)                  │
    │                                                             │
    │  Multiple quality levels:                                  │
    │  • 24kbps (mobile, low bandwidth)                       │
    │  • 96kbps (mobile, standard)                             │
    │  • 160kbps (desktop, high)                              │
    │  • 320kbps (premium, highest)                           │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

Streaming Protocol

    Spotify Streaming Protocol
    ======================

    Instead of HTTP streaming, Spotify uses a custom protocol:

    ┌─────────────────────────────────────────────────────────────┐
    │  Why Custom Protocol?                                       │
    │  ─────────────────────────────────────────────────────────│
    │                                                             │
    │  • Lower latency than HTTP                                 │
    │  • Better buffering control                               │
    │  • Optimized for frequent seeking                        │
    │  • Efficient for short playback sessions                 │
    │  • Pirate-proof (encrypted content)                      │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

    Flow:
    ─────────────────────────────────────────────────────────

    1. Client requests audio chunk
    2. Server streams encrypted audio
    3. Client decrypts and plays
    4. Buffer next chunks ahead

    Advantages:
    • ~200ms startup time
    • Seamless track transitions
    • Efficient seeking

50.4 Event-Driven Architecture

Spotify processes billions of events daily using Kafka.

    Spotify Event Infrastructure
    ==========================

    ┌─────────────────────────────────────────────────────────────┐
    │                    Event Types                              │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  • Playback events (song played, paused, skipped)        │
    │  • Search queries                                        │
    │  • Playlist modifications                                 │
    │  • Social interactions                                    │
    │  • Library changes                                        │
    │  • Errors and diagnostics                                │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

    ┌─────────────────────────────────────────────────────────────┐
    │                    Event Pipeline                            │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  Apps ──▶ Kafka ──▶ Consumers                             │
    │              │                                            │
    │              ├──▶ Spark Streaming (real-time)              │
    │              │                                            │
    │              ├──▶ Data Warehouse (batch)                   │
    │              │                                            │
    │              └──▶ Recommendation Models                    │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

Kafka at Scale

    Spotify's Kafka Cluster
    ======================

    ┌─────────────────────────────────────────────────────────────┐
    │  Scale:                                                      │
    │  • 100+ Kafka brokers                                       │
    │  • Trillions of messages per day                            │
    │  • Petabytes of data                                       │
    │  • Millions of events per second at peak                   │
    │                                                             │
    │  Topics:                                                     │
    │  • user-identity-events                                    │
    │  • playback-events                                          │
    │  • track-played-events                                      │
    │  • search-events                                            │
    │  • recommendation-events                                    │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

50.5 Recommendation System

Spotify’s recommendation system is legendary, especially Discover Weekly.

    Spotify Recommendation Pipeline
    ============================

    ┌─────────────────────────────────────────────────────────────┐
    │  Data Collection (Real-time)                              │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  User Actions:                                             │
    │  • What they listen to (complete vs skip)                 │
    │  • What they add to playlists                             │
    │  • What they search for                                   │
    │  • What they like/heart                                   │
    │  • Time of day they listen                                │
    │  • Social connections                                     │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────────┐
    │  Batch Processing (Offline)                               │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  • Collaborative filtering                                │
    │  • Audio analysis (the "audio" model)                    │
    │  • Embeddings for all tracks                             │
    │  • User clustering                                       │
    │                                                             │
    │  Using: Apache Spark, Python, TensorFlow                  │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘
                              │
                              ▼
    ┌─────────────────────────────────────────────────────────────┐
    │  Real-time Processing                                      │
    │  ────────────────────────────────────────────────────────│
    │                                                             │
    │  • Update recommendations in real-time                    │
    │  • "Because you played X" suggestions                      │
    │  • "Made For You" personalized playlists                   │
    │                                                             │
    │  Using: Kafka Streams, Redis                               │
    │                                                             │
    └─────────────────────────────────────────────────────────────┘

Recommendation Models

    Spotify's Recommendation Algorithms
    =================================

    1. COLLABORATIVE FILTERING
       ───────────────────────
       "Users with similar taste liked these tracks"

       Matrix factorization on user-track interactions

    2. AUDIO ANALYSIS
       ────────────────
       "This track sounds similar to tracks you like"

       • BPM, danceability, energy
       • Key, tempo
       • Instrumentalness
       • Audio embeddings

    3. NATURAL LANGUAGE PROCESSING
       ──────────────────────────
       "Tracks described with similar words"

       Scraped from music blogs, reviews

    4. CONVOLUTIONAL NEURAL NETWORKS
       ──────────────────────────────
       Direct audio analysis

       Raw audio → CNN → Embeddings

    ─────────────────────────────────────────────────────────

    Discover Weekly:
    ────────────────
    • 30 songs updated every Monday
    • Mix of:
      - Songs from similar users
      - Songs with similar audio
      - New releases from followed artists

50.6 Microservices at Spotify

    Spotify's Microservices
    =====================

    ┌─────────────────────────────────────────────────────────────┐
    │  ~1,000 microservices in production!                      │
    │                                                             │
    │  Each team owns:                                           │
    │  • Own service (end-to-end)                                │
    │  • Own data                                                │
    │  • Own deployment                                          │
    │  • On-call rotation                                        │
    └─────────────────────────────────────────────────────────────┘

    Key Services:
    ─────────────
    • metadata-service (track, artist info)
    • playback-service (streaming control)
    • recommendation-service
    • playlist-service
    • search-service
    • user-service
    • social-service
    • billing-service

Backend for Frontend (BFF)

    BFF Pattern at Spotify
    ====================

    ┌─────────────────────────────────────────────────────────────┐
    │                     Mobile App                                │
    └────────────────────────────┬────────────────────────────────┘
                                 │
                                 ▼
    ┌─────────────────────────────────────────────────────────────┐
    │                  Mobile BFF                                  │
    │                  (Dedicated for mobile)                      │
    │  ──────────────────────────────────────────────────────────│
    │                                                             │
    │  Aggregates:                                                │
    │  • User profile                                           │
    │  • Playlist data                                          │
    │  • Recommendations                                        │
    │  • Recently played                                        │
    │                                                             │
    │  Returns: Single optimized response                        │
    └─────────────────────────────────────────────────────────────┘
                                 │
                                 ▼
    ┌─────────────────────────────────────────────────────────────┐
    │              Core Microservices                             │
    │                                                              │
    │  • user-service                                            │
    │  • playlist-service                                        │
    │  • recommendation-service                                  │
    │  • ...                                                     │
    └─────────────────────────────────────────────────────────────┘

    Benefits:
    ─────────
    • Mobile-optimized responses
    • Reduced round trips
    • Independent scaling

50.7 Key Learnings from Spotify

    Spotify Engineering Principles
    ============================

    1. EVENT-DRIVEN
       ───────────────
       • Kafka for everything
       • Decoupled services
       • Real-time + batch processing
       • Complete audit trail

    2. MICROSERVICES
       ───────────────
       • ~1,000 independent services
       • Autonomous teams
       • Own data, own deployment

    3. GREMLIN CHAOS ENGINEERING
       ───────────────────────────
       • Inspired by Netflix
       • Regular chaos experiments
       • Build confidence in resilience

    4. RECOMMENDATIONS FIRST
       ─────────────────────
       • ML-driven experience
       • Multiple algorithms combined
       • Real-time personalization

    5. DEVELOPER EXPERIENCE
       ────────────────────
       • Internal tooling
       • Self-service platforms
       • Fast deploys

Summary

Music streaming - Custom protocol, low latency
Event-driven - Kafka for billions of events
Microservices - ~1,000 services
Recommendations - Multi-model ML pipeline
Metadata - Cassandra for catalog
CDN - Google Cloud CDN for audio delivery

Congratulations!

You’ve completed the System Design Guide!

This guide covered:

Fundamentals: Scalability, load balancing, caching
Database Design: SQL vs NoSQL, CAP theorem, replication, sharding
Architecture Patterns: Monolith, microservices, event-driven, CQRS, serverless
API Design: REST, GraphQL, authentication, message queues
Reliability: Circuit breakers, rate limiting, retries, timeouts
Observability: Logging, monitoring, alerting, distributed tracing
Security: TLS, OAuth/JWT, secrets management, DDoS protection
Real-world Case Studies: Twitter, Netflix, Uber, Amazon, Spotify

You’re now equipped to design large-scale distributed systems!

Keep learning and building! 🚀