Docker_monitoring

Chapter 13: Docker Monitoring - Logging, Metrics, and Observability

Introduction to Container Monitoring
Why Monitoring Matters
Logging Fundamentals
Docker Logging Drivers
Container Metrics
Monitoring Tools
Log Aggregation
Alerting Strategies
Distributed Tracing
Prometheus and Grafana
Hands-on Lab
Summary

Introduction to Container Monitoring

The Monitoring Challenge

Container environments present unique monitoring challenges compared to traditional infrastructure:

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER MONITORING CHALLENGES                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Traditional VMs vs Containers                                             │
│   ────────────────────────────                                              │
│                                                                             │
│   VMs                                Containers                             │
│   ┌─────────────────┐                 ┌─────────────────┐                 │
│   │ Stable IPs     │                 │ Dynamic IPs     │                 │
│   │ Long-lived     │                 │ Short-lived     │                 │
│   │ Known hostname │                 │ Random names    │                 │
│   │ Fixed resources│                 │ Variable resources│                │
│   │ Single app     │                 │ Multiple apps   │                 │
│   └─────────────────┘                 └─────────────────┘                 │
│                                                                             │
│   Container Specific Challenges:                                            │
│   ───────────────────────────────                                           │
│   • Containers spawn and die frequently                                     │
│   • IPs and hostnames change                                               │
│   • Resource usage fluctuates                                               │
│   • Need per-container metrics                                              │
│   • Distributed across hosts                                               │
│   • Ephemeral storage                                                       │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

The Three Pillars of Observability

┌─────────────────────────────────────────────────────────────────────────────┐
│                    THREE PILLARS OF OBSERVABILITY                            │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                                                                      │   │
│   │                          OBSERVABILITY                               │   │
│   │                                                                      │   │
│   │   ┌───────────────┐    ┌───────────────┐    ┌───────────────┐     │   │
│   │   │               │    │               │    │               │     │   │
│   │   │    LOGS       │    │   METRICS     │    │    TRACES     │     │   │
│   │   │               │    │               │    │               │     │   │
│   │   │ "What         │    │ "How much     │    │ "How things   │     │   │
│   │   │  happened?"   │    │  and how      │    │  flow?"       │     │   │
│   │   │               │    │  often?"      │    │               │     │   │
│   │   └───────┬───────┘    └───────┬───────┘    └───────┬───────┘     │   │
│   │           │                    │                    │              │   │
│   │           │                    │                    │              │   │
│   │   ┌───────┴────────────────────┴────────────────────┴───────┐     │   │
│   │   │                      UNIFIED PLATFORM                      │     │   │
│   │   │   (e.g., ELK, Datadog, Splunk, CloudWatch)              │     │   │
│   │   └───────────────────────────────────────────────────────────┘     │   │
│   │                                                                      │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why Monitoring Matters

Benefits of Monitoring

┌─────────────────────────────────────────────────────────────────────────────┐
│                    WHY MONITORING MATTERS                                    │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                                                                     │   │
│   │   1. PROACTIVE ISSUE DETECTION                                     │   │
│   │      • Spot problems before users notice                           │   │
│   │      • Identify trends and patterns                               │   │
│   │      • Capacity planning                                           │   │
│   │                                                                     │   │
│   │   2. FASTER TROUBLESHOOTING                                        │   │
│   │      • Quick root cause analysis                                   │   │
│   │      • Reduced MTTR (Mean Time To Recovery)                        │   │
│   │      • Historical context                                           │   │
│   │                                                                     │   │
│   │   3. PERFORMANCE OPTIMIZATION                                      │   │
│   │      • Resource utilization insights                               │   │
│   │      • Cost optimization                                           │   │
│   │      • Scaling decisions                                            │   │
│   │                                                                     │   │
│   │   4. BUSINESS INTELLIGENCE                                         │   │
│   │      • User behavior analytics                                     │   │
│   │      • Feature usage metrics                                       │   │
│   │      • ROI measurement                                              │   │
│   │                                                                     │   │
│   │   5. COMPLIANCE & AUDITING                                         │   │
│   │      • Audit trails                                                │   │
│   │      • Security monitoring                                         │   │
│   │      • Evidence for compliance                                      │   │
│   │                                                                     │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Logging Fundamentals

Container Logging Basics

# View container logs
docker logs <container_id>

# Follow logs in real-time
docker logs -f <container_id>

# Show last N lines
docker logs --tail 100 <container_id>

# Show timestamps
docker logs -t <container_id>

# Since and until timestamps
docker logs --since 2024-01-01T00:00:00 <container_id>
docker logs --until 2024-01-02T00:00:00 <container_id>

Application Logging Best Practices

# Best practice: Log to stdout/stderr
# Don't use file-based logging for containers

# Bad: Log to file
FROM node:18
RUN npm install -g log4js
CMD ["node", "app.js"]
# app.js writes to /var/log/app.log

# Good: Log to stdout
FROM node:18
CMD ["node", "app.js"]
# app.js uses console.log()

# Best: Structured logging (JSON)
# console.log(JSON.stringify({
#   timestamp: new Date().toISOString(),
#   level: 'info',
#   message: 'User logged in',
#   userId: user.id,
#   metadata: { ... }
# }))

Docker Logging Drivers

Available Logging Drivers

# Default logging driver (json-file)
docker info | grep "Logging Driver"

# Run with specific logging driver
docker run --log-driver json-file nginx

# Common drivers:
# - json-file (default)
# - syslog
# - journald
# - gelf (Graylog)
# - fluentd
# - awslogs (CloudWatch)
# - gcplogs (GCP)
# - splunk

JSON File Driver

# Configure in daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Syslog Driver

# Send logs to syslog
docker run \
  --log-driver syslog \
  --log-opt syslog-address=tcp://localhost:514 \
  --log-opt syslog-facility=daemon \
  nginx

Configuring Logging in Docker Compose

version: '3.8'

services:
  web:
    image: nginx
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"

  api:
    image: myapi:latest
    logging:
      driver: "syslog"
      options:
        syslog-address: "tcp://localhost:514"
        syslog-facility: "daemon"

Container Metrics

Key Metrics to Monitor

┌─────────────────────────────────────────────────────────────────────────────┐
│                    CONTAINER METRICS CATEGORIES                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  CPU METRICS                                                         │   │
│   │  ─────────────────────────────────────────────────────────────────  │   │
│   │  • cpu.user - User space CPU usage                                 │   │
│   │  • cpu.system - Kernel space CPU usage                            │   │
│   │  • cpu.usage - Total CPU usage (percentage)                       │   │
│   │  • cpu.throttled - Time CPU was throttled                         │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  MEMORY METRICS                                                      │   │
│   │  ─────────────────────────────────────────────────────────────────  │   │
│   │  • memory.usage - Memory usage in bytes                            │   │
│   │  • memory.limit - Memory limit                                     │   │
│   │  • memory.percent - Usage percentage                               │   │
│   │  • memory.swap - Swap usage                                        │   │
│   │  • memory.ooms - Out of memory events                              │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  NETWORK METRICS                                                     │   │
│   │  ─────────────────────────────────────────────────────────────────  │   │
│   │  • network.rx_bytes - Received bytes                                │   │
│   │  • network.tx_bytes - Transmitted bytes                            │   │
│   │  • network.rx_packets - Received packets                           │   │
│   │  • network.tx_packets - Transmitted packets                        │   │
│   │  • network.rx_errors - Receive errors                              │   │
│   │  • network.tx_errors - Transmit errors                             │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  DISK METRICS                                                         │   │
│   │  ─────────────────────────────────────────────────────────────────  │   │
│   │  • disk.read_bytes - Read bytes                                    │   │
│   │  • disk.write_bytes - Written bytes                                │   │
│   │  • disk.read_ops - Read operations                                 │   │
│   │  • disk.write_ops - Write operations                               │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │  CONTAINER METRICS                                                   │   │
│   │  ─────────────────────────────────────────────────────────────────  │   │
│   │  • container.started_at - Container start time                     │   │
│   │  • container.finished_at - Container end time                     │   │
│   │  • container.restart_count - Number of restarts                   │   │
│   │  • container.pids - Number of processes                            │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Viewing Container Stats

# Real-time container stats
docker stats

# Stats for specific container
docker stats <container_id>

# Stats with no streaming (one time)
docker stats --no-stream <container_id>

# All containers, formatted output
docker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"

Monitoring Tools

Tool Categories

┌─────────────────────────────────────────────────────────────────────────────┐
│                    MONITORING TOOLS LANDSCAPE                                │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Open Source                                                              │
│   ───────────                                                              │
│   ┌────────────────┐  ┌────────────────┐  ┌────────────────┐               │
│   │   Prometheus   │  │    Grafana    │  │    cAdvisor   │               │
│   │   Metrics      │  │   Dashboard   │  │   Container    │               │
│   │   Collection   │  │   Visualizing │  │   Metrics     │               │
│   └────────────────┘  └────────────────┘  └────────────────┘               │
│                                                                             │
│   Commercial                                                               │
│   ───────────                                                              │
│   ┌────────────────┐  ┌────────────────┐  ┌────────────────┐               │
│   │    Datadog     │  │   New Relic   │  │     Splunk    │               │
│   │   Full Stack   │  │   APM + Infra │  │   Logs +      │               │
│   │   Monitoring   │  │   Monitoring   │  │   Metrics     │               │
│   └────────────────┘  └────────────────┘  └────────────────┘               │
│                                                                             │
│   Cloud Native                                                            │
│   ─────────────                                                           │
│   ┌────────────────┐  ┌────────────────┐  ┌────────────────┐               │
│   │  CloudWatch    │  │   Stackdriver │  │    Azure       │               │
│   │    (AWS)      │  │     (GCP)      │  │    Monitor     │               │
│   └────────────────┘  └────────────────┘  └────────────────┘               │
│                                                                             │
│   Logging                                                                 │
│   ────────                                                                │
│   ┌────────────────┐  ┌────────────────┐  ┌────────────────┐               │
│   │      ELK       │  │     EFK       │  │    Loki       │               │
│   │  (Elastic)    │  │(Fluentd + KB) │  │   (Grafana)   │               │
│   └────────────────┘  └────────────────┘  └────────────────┘               │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Log Aggregation

ELK Stack (Elasticsearch, Logstash, Kibana)

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ELK STACK ARCHITECTURE                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                     LOG SOURCES                                      │   │
│   │                                                                      │   │
│   │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐          │   │
│   │   │Container │  │   Host   │  │  Apps    │  │  Network │          │   │
│   │   │  Logs    │  │  Logs    │  │  Logs    │  │  Devices │          │   │
│   │   └────┬─────┘  └────┬─────┘  └────┬─────┘  └────┬─────┘          │   │
│   │        │             │             │             │                  │   │
│   └────────┼─────────────┼─────────────┼─────────────┼──────────────────┘   │
│            │             │             │             │                      │
│            ▼             ▼             ▼             ▼                      │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      LOGSTASH / FLUENTD                             │   │
│   │  ┌────────────┐  ┌────────────┐  ┌────────────┐                    │   │
│   │  │   Input    │  │   Filter   │  │   Output   │                    │   │
│   │  │  Plugins   │──│  Plugins   │──│  Plugins   │                    │   │
│   │  └────────────┘  └────────────┘  └────────────┘                    │   │
│   │    (beats, tcp)   (grok, mutate) (elasticsearch)                   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                         │
│                                    ▼                                         │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                      ELASTICSEARCH                                  │   │
│   │                                                                      │   │
│   │   ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐          │   │
│   │   │ Index 1  │  │ Index 2  │  │ Index 3  │  │ Index N   │          │   │
│   │   │ Shard 1  │  │ Shard 1  │  │ Shard 1  │  │ Shard 1   │          │   │
│   │   └──────────┘  └──────────┘  └──────────┘  └──────────┘          │   │
│   │                              Cluster                                  │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                    │                                         │
│                                    ▼                                         │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │                         KIBANA                                      │   │
│   │   ┌────────────────────────────────────────────────────────────┐   │   │
│   │   │                                                           │   │   │
│   │   │   Dashboards   │   Visualizations  │   Discover          │   │   │
│   │   │                                                           │   │   │
│   │   └────────────────────────────────────────────────────────────┘   │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Setting Up Filebeat for Docker

# docker-compose.yml with ELK stack
version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ports:
      - "9200:9200"
    volumes:
      - elasticsearch-data:/usr/share/elasticsearch/data

  kibana:
    image: docker.elastic.co/kibana/kibana:8.11.0
    ports:
      - "5601:5601"
    depends_on:
      - elasticsearch

  filebeat:
    image: docker.elastic.co/beats/filebeat:8.11.0
    volumes:
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock:ro
      - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro
    depends_on:
      - elasticsearch

volumes:
  elasticsearch-data:

filebeat.inputs:
  - type: container
    paths:
      - /var/lib/docker/containers/*/*.log
    processors:
      - add_docker_metadata:
          host: "unix:///var/run/docker.sock"

output.elasticsearch:
  hosts: ["elasticsearch:9200"]

Alerting Strategies

Alert Types

┌─────────────────────────────────────────────────────────────────────────────┐
│                    ALERT STRATEGY                                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Alert Severity Levels                                                     │
│   ─────────────────────                                                     │
│                                                                             │
│   ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐           │
│   │   CRITICAL │  │   WARNING  │  │  INFO     │  │   DEBUG   │           │
│   │            │  │            │  │            │  │            │           │
│   │ Immediate  │  │ Needs      │  │ FYI only   │  │ Not for    │           │
│   │ action     │  │ attention  │  │            │  │ alerting   │           │
│   │ required   │  │ soon       │  │            │  │            │           │
│   └────────────┘  └────────────┘  └────────────┘  └────────────┘           │
│                                                                             │
│   Common Container Alerts                                                   │
│   ─────────────────────                                                     │
│                                                                             │
│   Metric                    Warning Threshold   Critical Threshold          │
│   ─────────────────────────────────────────────────────────────────────    │
│   CPU Usage                > 70%              > 90%                         │
│   Memory Usage             > 75%              > 90%                         │
│   Container Restarts      > 2/hour          > 5/hour                      │
│   Disk Usage               > 80%              > 95%                         │
│   Network Errors          > 10/min           > 50/min                      │
│   Response Time           > 500ms            > 2s                          │
│   Error Rate              > 1%               > 5%                          │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Prometheus Alert Rules

groups:
  - name: container_alerts
    interval: 30s
    rules:
      # High CPU usage
      - alert: HighCPUUsage
        expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "Container {{ $labels.name }} CPU usage is above 80%"

      # Critical CPU
      - alert: CriticalCPUUsage
        expr: rate(container_cpu_usage_seconds_total[5m]) > 0.95
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Critical CPU usage"
          description: "Container {{ $labels.name }} CPU usage is above 95%"

      # High Memory
      - alert: HighMemoryUsage
        expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage"
          description: "Container {{ $labels.name }} memory is above 85% of limit"

      # Container down
      - alert: ContainerDown
        expr: up{job="docker"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Container down"
          description: "Container {{ $labels.name }} has been down for 1 minute"

      # Too many restarts
      - alert: ContainerRestarts
        expr: increase(container_restart_count[1h]) > 3
        labels:
          severity: warning
        annotations:
          summary: "Frequent container restarts"
          description: "Container {{ $labels.name }} restarted {{ $value }} times in the last hour"

Distributed Tracing

What is Distributed Tracing?

┌─────────────────────────────────────────────────────────────────────────────┐
│                    DISTRIBUTED TRACING CONCEPT                               │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   Traditional (Monolithic)           Distributed (Microservices)            │
│   ┌─────────────────────┐            ┌─────────────────────────────────┐    │
│   │                     │            │                                 │    │
│   │  ┌─────────────┐   │            │   ┌───────┐   ┌───────┐        │    │
│   │  │   Request   │   │            │   │  API  │───│ Order │        │    │
│   │  └─────────────┘   │            │   │Gateway│   │Service│        │    │
│   │        │           │            │   └───┬───┘   └───┬───┘        │    │
│   │  ┌─────┴──────┐    │            │       │           │             │    │
│   │  │   Service  │    │            │   ┌───┴───┐   ┌───┴───┐        │    │
│   │  │             │    │            │   │ User  │   │Payment│        │    │
│   │  └─────────────┘    │            │   │Service│   │Service│        │    │
│   │        │           │            │   └───────┘   └───────┘        │    │
│   │  ┌─────┴──────┐    │            │       │           │             │    │
│   │  │  Database   │    │            │   ┌───┴───┐   ┌───┴───┐        │    │
│   │  │             │    │            │   │  DB   │   │  DB   │        │    │
│   │  └─────────────┘    │            │   └───────┘   └───────┘        │    │
│   └─────────────────────┘            └─────────────────────────────────┘    │
│                                                                             │
│   Easy to trace!                     Need distributed tracing!              │
│                                                                             │
│   Trace:                             Trace:                                  │
│   [Request → Service → DB]          [API → Order → User → DB]              │
│                                     [API → Order → Payment → DB]           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Distributed Tracing Tools

# Jaeger - Open Source
docker run -d \
  --name jaeger \
  -e COLLECTOR_OTLP_ENABLED=true \
  -p 6831:6831/udp \
  -p 16686:16686 \
  jaegertracing/all-in-one:1.47

# Zipkin - Open Source
docker run -d \
  --name zipkin \
  -p 9411:9411 \
  openzipkin/zipkin:latest

Prometheus and Grafana

Setting Up Prometheus

global:
  scrape_interval: 15s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp:8080']

Docker Compose with Monitoring Stack

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.45.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:10.0.0
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    ports:
      - "3000:3000"
    depends_on:
      - prometheus

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"

volumes:
  prometheus-data:
  grafana-data:

Sample Grafana Dashboard

┌─────────────────────────────────────────────────────────────────────────────┐
│                    GRAFANA DASHBOARD EXAMPLE                                 │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────────┐   │
│   │               Container Monitoring Dashboard                        │   │
│   ├─────────────────────────────────────────────────────────────────────┤   │
│   │                                                                     │   │
│   │   ┌─────────────────────┐  ┌─────────────────────┐                 │   │
│   │   │   CPU Usage %       │  │   Memory Usage %    │                 │   │
│   │   │                     │  │                     │                 │   │
│   │   │   ████████░░ 78%   │  │   ████████░░ 65%    │                 │   │
│   │   └─────────────────────┘  └─────────────────────┘                 │   │
│   │                                                                     │   │
│   │   ┌─────────────────────────────────────────────────────────────┐ │   │
│   │   │                   CPU Timeline                                │ │   │
│   │   │   100% ┤                        ╭──╮                          │ │   │
│   │   │    75% ┤              ╭─────────╯  ╰─────────                 │ │   │
│   │   │    50% ┤         ╭────╯                               │       │ │   │
│   │   │    25% ┤    ╭────╯                                      │       │ │   │
│   │   │     0% ┼────╯                                             │       │ │   │
│   │   │         00:00    04:00    08:00    12:00    16:00           │       │   │
│   │   └─────────────────────────────────────────────────────────────┘ │   │
│   │                                                                     │   │
│   │   ┌─────────────────────────────────────────────────────────────┐ │   │
│   │   │                   Network I/O                                │ │   │
│   │   │   RX (MB/s) ┤ TX (MB/s) ┤                                  │ │   │
│   │   │   ████████  │ ▓▓▓▓▓▓▓▓  │                                   │ │   │
│   │   │     45.2    │   23.1    │                                   │ │   │
│   │   └─────────────────────────────────────────────────────────────┘ │   │
│   │                                                                     │   │
│   │   Container: web-01  │  Image: nginx  │  Status: Running          │   │
│   └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Hands-on Lab

Lab: Set Up Complete Monitoring Stack

In this hands-on lab, we’ll set up a complete monitoring stack with Prometheus and Grafana.

Prerequisites

Docker and Docker Compose installed
At least 2GB RAM available

Lab Steps

# Step 1: Create monitoring directory
mkdir -p monitoring && cd monitoring

# Step 2: Create prometheus configuration
cat > prometheus.yml << 'EOF'
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']
EOF

# Step 3: Create docker-compose.yml
cat > docker-compose.yml << 'EOF'
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:v2.45.0
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
    ports:
      - "9090:9090"
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:10.0.0
    volumes:
      - grafana-data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    ports:
      - "3000:3000"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:v0.47.0
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:ro
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

volumes:
  prometheus-data:
  grafana-data:

networks:
  monitoring:
    driver: bridge
EOF

# Step 4: Start monitoring stack
docker-compose up -d

# Step 5: Verify services
docker-compose ps

# Step 6: Access Grafana
# Open http://localhost:3000
# Username: admin
# Password: admin

# Step 7: Add Prometheus data source in Grafana
# Configuration → Data Sources → Add data source → Prometheus
# URL: http://prometheus:9090

# Step 8: Import dashboard
# Use Grafana dashboard ID: 193 - Docker Monitoring

# Step 9: Run a test container
docker run -d --name test-nginx nginx

# Step 10: View metrics
curl http://localhost:8080/metrics | grep container

# Step 11: Clean up
docker stop test-nginx
docker rm test-nginx
docker-compose down

Summary

Key Takeaways

Three Pillars - Logs, metrics, and traces work together
Centralized Logging - Aggregate logs for analysis
Real-time Metrics - Monitor CPU, memory, network, disk
Alert Proactively - Set up alerts before issues become critical
Use Proper Tools - Prometheus, Grafana, ELK stack
Dashboards - Visualize metrics for quick understanding
Distributed Tracing - For microservices, use tracing tools

Quick Reference Commands

# View container logs
docker logs -f <container>

# Container stats
docker stats

# Inspect container
docker inspect <container>

# Docker daemon logging driver
docker info | grep "Logging Driver"

# Prometheus queries
rate(container_cpu_usage_seconds_total[5m])
container_memory_usage_bytes / container_spec_memory_limit_bytes

Next Steps

In the next chapter, we’ll explore Advanced Docker Networking (Chapter 14), covering:

Custom networks
DNS and service discovery
Load balancing
Network plugins