Docker_monitoring
Chapter 13: Docker Monitoring - Logging, Metrics, and Observability
Section titled “Chapter 13: Docker Monitoring - Logging, Metrics, and Observability”Table of Contents
Section titled “Table of Contents”- Introduction to Container Monitoring
- Why Monitoring Matters
- Logging Fundamentals
- Docker Logging Drivers
- Container Metrics
- Monitoring Tools
- Log Aggregation
- Alerting Strategies
- Distributed Tracing
- Prometheus and Grafana
- Hands-on Lab
- Summary
Introduction to Container Monitoring
Section titled “Introduction to Container Monitoring”The Monitoring Challenge
Section titled “The Monitoring Challenge”Container environments present unique monitoring challenges compared to traditional infrastructure:
┌─────────────────────────────────────────────────────────────────────────────┐│ CONTAINER MONITORING CHALLENGES │├─────────────────────────────────────────────────────────────────────────────┤│ ││ Traditional VMs vs Containers ││ ──────────────────────────── ││ ││ VMs Containers ││ ┌─────────────────┐ ┌─────────────────┐ ││ │ Stable IPs │ │ Dynamic IPs │ ││ │ Long-lived │ │ Short-lived │ ││ │ Known hostname │ │ Random names │ ││ │ Fixed resources│ │ Variable resources│ ││ │ Single app │ │ Multiple apps │ ││ └─────────────────┘ └─────────────────┘ ││ ││ Container Specific Challenges: ││ ─────────────────────────────── ││ • Containers spawn and die frequently ││ • IPs and hostnames change ││ • Resource usage fluctuates ││ • Need per-container metrics ││ • Distributed across hosts ││ • Ephemeral storage ││ │└─────────────────────────────────────────────────────────────────────────────┘The Three Pillars of Observability
Section titled “The Three Pillars of Observability”┌─────────────────────────────────────────────────────────────────────────────┐│ THREE PILLARS OF OBSERVABILITY │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ OBSERVABILITY │ ││ │ │ ││ │ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ ││ │ │ │ │ │ │ │ │ ││ │ │ LOGS │ │ METRICS │ │ TRACES │ │ ││ │ │ │ │ │ │ │ │ ││ │ │ "What │ │ "How much │ │ "How things │ │ ││ │ │ happened?" │ │ and how │ │ flow?" │ │ ││ │ │ │ │ often?" │ │ │ │ ││ │ └───────┬───────┘ └───────┬───────┘ └───────┬───────┘ │ ││ │ │ │ │ │ ││ │ │ │ │ │ ││ │ ┌───────┴────────────────────┴────────────────────┴───────┐ │ ││ │ │ UNIFIED PLATFORM │ │ ││ │ │ (e.g., ELK, Datadog, Splunk, CloudWatch) │ │ ││ │ └───────────────────────────────────────────────────────────┘ │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Why Monitoring Matters
Section titled “Why Monitoring Matters”Benefits of Monitoring
Section titled “Benefits of Monitoring”┌─────────────────────────────────────────────────────────────────────────────┐│ WHY MONITORING MATTERS │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ │ ││ │ 1. PROACTIVE ISSUE DETECTION │ ││ │ • Spot problems before users notice │ ││ │ • Identify trends and patterns │ ││ │ • Capacity planning │ ││ │ │ ││ │ 2. FASTER TROUBLESHOOTING │ ││ │ • Quick root cause analysis │ ││ │ • Reduced MTTR (Mean Time To Recovery) │ ││ │ • Historical context │ ││ │ │ ││ │ 3. PERFORMANCE OPTIMIZATION │ ││ │ • Resource utilization insights │ ││ │ • Cost optimization │ ││ │ • Scaling decisions │ ││ │ │ ││ │ 4. BUSINESS INTELLIGENCE │ ││ │ • User behavior analytics │ ││ │ • Feature usage metrics │ ││ │ • ROI measurement │ ││ │ │ ││ │ 5. COMPLIANCE & AUDITING │ ││ │ • Audit trails │ ││ │ • Security monitoring │ ││ │ • Evidence for compliance │ ││ │ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Logging Fundamentals
Section titled “Logging Fundamentals”Container Logging Basics
Section titled “Container Logging Basics”# View container logsdocker logs <container_id>
# Follow logs in real-timedocker logs -f <container_id>
# Show last N linesdocker logs --tail 100 <container_id>
# Show timestampsdocker logs -t <container_id>
# Since and until timestampsdocker logs --since 2024-01-01T00:00:00 <container_id>docker logs --until 2024-01-02T00:00:00 <container_id>Application Logging Best Practices
Section titled “Application Logging Best Practices”# Best practice: Log to stdout/stderr# Don't use file-based logging for containers
# Bad: Log to fileFROM node:18RUN npm install -g log4jsCMD ["node", "app.js"]# app.js writes to /var/log/app.log
# Good: Log to stdoutFROM node:18CMD ["node", "app.js"]# app.js uses console.log()
# Best: Structured logging (JSON)# console.log(JSON.stringify({# timestamp: new Date().toISOString(),# level: 'info',# message: 'User logged in',# userId: user.id,# metadata: { ... }# }))Docker Logging Drivers
Section titled “Docker Logging Drivers”Available Logging Drivers
Section titled “Available Logging Drivers”# Default logging driver (json-file)docker info | grep "Logging Driver"
# Run with specific logging driverdocker run --log-driver json-file nginx
# Common drivers:# - json-file (default)# - syslog# - journald# - gelf (Graylog)# - fluentd# - awslogs (CloudWatch)# - gcplogs (GCP)# - splunkJSON File Driver
Section titled “JSON File Driver”# Configure in daemon.json{ "log-driver": "json-file", "log-opts": { "max-size": "10m", "max-file": "3" }}Syslog Driver
Section titled “Syslog Driver”# Send logs to syslogdocker run \ --log-driver syslog \ --log-opt syslog-address=tcp://localhost:514 \ --log-opt syslog-facility=daemon \ nginxConfiguring Logging in Docker Compose
Section titled “Configuring Logging in Docker Compose”version: '3.8'
services: web: image: nginx logging: driver: "json-file" options: max-size: "10m" max-file: "3"
api: image: myapi:latest logging: driver: "syslog" options: syslog-address: "tcp://localhost:514" syslog-facility: "daemon"Container Metrics
Section titled “Container Metrics”Key Metrics to Monitor
Section titled “Key Metrics to Monitor”┌─────────────────────────────────────────────────────────────────────────────┐│ CONTAINER METRICS CATEGORIES │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ CPU METRICS │ ││ │ ───────────────────────────────────────────────────────────────── │ ││ │ • cpu.user - User space CPU usage │ ││ │ • cpu.system - Kernel space CPU usage │ ││ │ • cpu.usage - Total CPU usage (percentage) │ ││ │ • cpu.throttled - Time CPU was throttled │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ MEMORY METRICS │ ││ │ ───────────────────────────────────────────────────────────────── │ ││ │ • memory.usage - Memory usage in bytes │ ││ │ • memory.limit - Memory limit │ ││ │ • memory.percent - Usage percentage │ ││ │ • memory.swap - Swap usage │ ││ │ • memory.ooms - Out of memory events │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ NETWORK METRICS │ ││ │ ───────────────────────────────────────────────────────────────── │ ││ │ • network.rx_bytes - Received bytes │ ││ │ • network.tx_bytes - Transmitted bytes │ ││ │ • network.rx_packets - Received packets │ ││ │ • network.tx_packets - Transmitted packets │ ││ │ • network.rx_errors - Receive errors │ ││ │ • network.tx_errors - Transmit errors │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ DISK METRICS │ ││ │ ───────────────────────────────────────────────────────────────── │ ││ │ • disk.read_bytes - Read bytes │ ││ │ • disk.write_bytes - Written bytes │ ││ │ • disk.read_ops - Read operations │ ││ │ • disk.write_ops - Write operations │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ CONTAINER METRICS │ ││ │ ───────────────────────────────────────────────────────────────── │ ││ │ • container.started_at - Container start time │ ││ │ • container.finished_at - Container end time │ ││ │ • container.restart_count - Number of restarts │ ││ │ • container.pids - Number of processes │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Viewing Container Stats
Section titled “Viewing Container Stats”# Real-time container statsdocker stats
# Stats for specific containerdocker stats <container_id>
# Stats with no streaming (one time)docker stats --no-stream <container_id>
# All containers, formatted outputdocker stats --format "table {{.Name}}\t{{.CPUPerc}}\t{{.MemUsage}}"Monitoring Tools
Section titled “Monitoring Tools”Tool Categories
Section titled “Tool Categories”┌─────────────────────────────────────────────────────────────────────────────┐│ MONITORING TOOLS LANDSCAPE │├─────────────────────────────────────────────────────────────────────────────┤│ ││ Open Source ││ ─────────── ││ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ││ │ Prometheus │ │ Grafana │ │ cAdvisor │ ││ │ Metrics │ │ Dashboard │ │ Container │ ││ │ Collection │ │ Visualizing │ │ Metrics │ ││ └────────────────┘ └────────────────┘ └────────────────┘ ││ ││ Commercial ││ ─────────── ││ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ││ │ Datadog │ │ New Relic │ │ Splunk │ ││ │ Full Stack │ │ APM + Infra │ │ Logs + │ ││ │ Monitoring │ │ Monitoring │ │ Metrics │ ││ └────────────────┘ └────────────────┘ └────────────────┘ ││ ││ Cloud Native ││ ───────────── ││ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ││ │ CloudWatch │ │ Stackdriver │ │ Azure │ ││ │ (AWS) │ │ (GCP) │ │ Monitor │ ││ └────────────────┘ └────────────────┘ └────────────────┘ ││ ││ Logging ││ ──────── ││ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ ││ │ ELK │ │ EFK │ │ Loki │ ││ │ (Elastic) │ │(Fluentd + KB) │ │ (Grafana) │ ││ └────────────────┘ └────────────────┘ └────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Log Aggregation
Section titled “Log Aggregation”ELK Stack (Elasticsearch, Logstash, Kibana)
Section titled “ELK Stack (Elasticsearch, Logstash, Kibana)”┌─────────────────────────────────────────────────────────────────────────────┐│ ELK STACK ARCHITECTURE │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ LOG SOURCES │ ││ │ │ ││ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ││ │ │Container │ │ Host │ │ Apps │ │ Network │ │ ││ │ │ Logs │ │ Logs │ │ Logs │ │ Devices │ │ ││ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ ││ │ │ │ │ │ │ ││ └────────┼─────────────┼─────────────┼─────────────┼──────────────────┘ ││ │ │ │ │ ││ ▼ ▼ ▼ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ LOGSTASH / FLUENTD │ ││ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ ││ │ │ Input │ │ Filter │ │ Output │ │ ││ │ │ Plugins │──│ Plugins │──│ Plugins │ │ ││ │ └────────────┘ └────────────┘ └────────────┘ │ ││ │ (beats, tcp) (grok, mutate) (elasticsearch) │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ ELASTICSEARCH │ ││ │ │ ││ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ ││ │ │ Index 1 │ │ Index 2 │ │ Index 3 │ │ Index N │ │ ││ │ │ Shard 1 │ │ Shard 1 │ │ Shard 1 │ │ Shard 1 │ │ ││ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ ││ │ Cluster │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │ ││ ▼ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ KIBANA │ ││ │ ┌────────────────────────────────────────────────────────────┐ │ ││ │ │ │ │ ││ │ │ Dashboards │ Visualizations │ Discover │ │ ││ │ │ │ │ ││ │ └────────────────────────────────────────────────────────────┘ │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Setting Up Filebeat for Docker
Section titled “Setting Up Filebeat for Docker”# docker-compose.yml with ELK stackversion: '3.8'
services: elasticsearch: image: docker.elastic.co/elasticsearch/elasticsearch:8.11.0 environment: - discovery.type=single-node - xpack.security.enabled=false ports: - "9200:9200" volumes: - elasticsearch-data:/usr/share/elasticsearch/data
kibana: image: docker.elastic.co/kibana/kibana:8.11.0 ports: - "5601:5601" depends_on: - elasticsearch
filebeat: image: docker.elastic.co/beats/filebeat:8.11.0 volumes: - /var/lib/docker/containers:/var/lib/docker/containers:ro - /var/run/docker.sock:/var/run/docker.sock:ro - ./filebeat.yml:/usr/share/filebeat/filebeat.yml:ro depends_on: - elasticsearch
volumes: elasticsearch-data:filebeat.inputs: - type: container paths: - /var/lib/docker/containers/*/*.log processors: - add_docker_metadata: host: "unix:///var/run/docker.sock"
output.elasticsearch: hosts: ["elasticsearch:9200"]Alerting Strategies
Section titled “Alerting Strategies”Alert Types
Section titled “Alert Types”┌─────────────────────────────────────────────────────────────────────────────┐│ ALERT STRATEGY │├─────────────────────────────────────────────────────────────────────────────┤│ ││ Alert Severity Levels ││ ───────────────────── ││ ││ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ ││ │ CRITICAL │ │ WARNING │ │ INFO │ │ DEBUG │ ││ │ │ │ │ │ │ │ │ ││ │ Immediate │ │ Needs │ │ FYI only │ │ Not for │ ││ │ action │ │ attention │ │ │ │ alerting │ ││ │ required │ │ soon │ │ │ │ │ ││ └────────────┘ └────────────┘ └────────────┘ └────────────┘ ││ ││ Common Container Alerts ││ ───────────────────── ││ ││ Metric Warning Threshold Critical Threshold ││ ───────────────────────────────────────────────────────────────────── ││ CPU Usage > 70% > 90% ││ Memory Usage > 75% > 90% ││ Container Restarts > 2/hour > 5/hour ││ Disk Usage > 80% > 95% ││ Network Errors > 10/min > 50/min ││ Response Time > 500ms > 2s ││ Error Rate > 1% > 5% ││ │└─────────────────────────────────────────────────────────────────────────────┘Prometheus Alert Rules
Section titled “Prometheus Alert Rules”groups: - name: container_alerts interval: 30s rules: # High CPU usage - alert: HighCPUUsage expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8 for: 5m labels: severity: warning annotations: summary: "High CPU usage detected" description: "Container {{ $labels.name }} CPU usage is above 80%"
# Critical CPU - alert: CriticalCPUUsage expr: rate(container_cpu_usage_seconds_total[5m]) > 0.95 for: 2m labels: severity: critical annotations: summary: "Critical CPU usage" description: "Container {{ $labels.name }} CPU usage is above 95%"
# High Memory - alert: HighMemoryUsage expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.85 for: 5m labels: severity: warning annotations: summary: "High memory usage" description: "Container {{ $labels.name }} memory is above 85% of limit"
# Container down - alert: ContainerDown expr: up{job="docker"} == 0 for: 1m labels: severity: critical annotations: summary: "Container down" description: "Container {{ $labels.name }} has been down for 1 minute"
# Too many restarts - alert: ContainerRestarts expr: increase(container_restart_count[1h]) > 3 labels: severity: warning annotations: summary: "Frequent container restarts" description: "Container {{ $labels.name }} restarted {{ $value }} times in the last hour"Distributed Tracing
Section titled “Distributed Tracing”What is Distributed Tracing?
Section titled “What is Distributed Tracing?”┌─────────────────────────────────────────────────────────────────────────────┐│ DISTRIBUTED TRACING CONCEPT │├─────────────────────────────────────────────────────────────────────────────┤│ ││ Traditional (Monolithic) Distributed (Microservices) ││ ┌─────────────────────┐ ┌─────────────────────────────────┐ ││ │ │ │ │ ││ │ ┌─────────────┐ │ │ ┌───────┐ ┌───────┐ │ ││ │ │ Request │ │ │ │ API │───│ Order │ │ ││ │ └─────────────┘ │ │ │Gateway│ │Service│ │ ││ │ │ │ │ └───┬───┘ └───┬───┘ │ ││ │ ┌─────┴──────┐ │ │ │ │ │ ││ │ │ Service │ │ │ ┌───┴───┐ ┌───┴───┐ │ ││ │ │ │ │ │ │ User │ │Payment│ │ ││ │ └─────────────┘ │ │ │Service│ │Service│ │ ││ │ │ │ │ └───────┘ └───────┘ │ ││ │ ┌─────┴──────┐ │ │ │ │ │ ││ │ │ Database │ │ │ ┌───┴───┐ ┌───┴───┐ │ ││ │ │ │ │ │ │ DB │ │ DB │ │ ││ │ └─────────────┘ │ │ └───────┘ └───────┘ │ ││ └─────────────────────┘ └─────────────────────────────────┘ ││ ││ Easy to trace! Need distributed tracing! ││ ││ Trace: Trace: ││ [Request → Service → DB] [API → Order → User → DB] ││ [API → Order → Payment → DB] ││ │└─────────────────────────────────────────────────────────────────────────────┘Distributed Tracing Tools
Section titled “Distributed Tracing Tools”# Jaeger - Open Sourcedocker run -d \ --name jaeger \ -e COLLECTOR_OTLP_ENABLED=true \ -p 6831:6831/udp \ -p 16686:16686 \ jaegertracing/all-in-one:1.47
# Zipkin - Open Sourcedocker run -d \ --name zipkin \ -p 9411:9411 \ openzipkin/zipkin:latestPrometheus and Grafana
Section titled “Prometheus and Grafana”Setting Up Prometheus
Section titled “Setting Up Prometheus”global: scrape_interval: 15s evaluation_interval: 15s
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
- job_name: 'docker' static_configs: - targets: ['cadvisor:8080']
- job_name: 'node-exporter' static_configs: - targets: ['node-exporter:9100']
- job_name: 'myapp' static_configs: - targets: ['myapp:8080']Docker Compose with Monitoring Stack
Section titled “Docker Compose with Monitoring Stack”version: '3.8'
services: prometheus: image: prom/prometheus:v2.45.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' ports: - "9090:9090"
grafana: image: grafana/grafana:10.0.0 volumes: - grafana-data:/var/lib/grafana environment: - GF_SECURITY_ADMIN_PASSWORD=admin ports: - "3000:3000" depends_on: - prometheus
cadvisor: image: gcr.io/cadvisor/cadvisor:v0.47.0 volumes: - /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro ports: - "8080:8080"
volumes: prometheus-data: grafana-data:Sample Grafana Dashboard
Section titled “Sample Grafana Dashboard”┌─────────────────────────────────────────────────────────────────────────────┐│ GRAFANA DASHBOARD EXAMPLE │├─────────────────────────────────────────────────────────────────────────────┤│ ││ ┌─────────────────────────────────────────────────────────────────────┐ ││ │ Container Monitoring Dashboard │ ││ ├─────────────────────────────────────────────────────────────────────┤ ││ │ │ ││ │ ┌─────────────────────┐ ┌─────────────────────┐ │ ││ │ │ CPU Usage % │ │ Memory Usage % │ │ ││ │ │ │ │ │ │ ││ │ │ ████████░░ 78% │ │ ████████░░ 65% │ │ ││ │ └─────────────────────┘ └─────────────────────┘ │ ││ │ │ ││ │ ┌─────────────────────────────────────────────────────────────┐ │ ││ │ │ CPU Timeline │ │ ││ │ │ 100% ┤ ╭──╮ │ │ ││ │ │ 75% ┤ ╭─────────╯ ╰───────── │ │ ││ │ │ 50% ┤ ╭────╯ │ │ │ ││ │ │ 25% ┤ ╭────╯ │ │ │ ││ │ │ 0% ┼────╯ │ │ │ ││ │ │ 00:00 04:00 08:00 12:00 16:00 │ │ ││ │ └─────────────────────────────────────────────────────────────┘ │ ││ │ │ ││ │ ┌─────────────────────────────────────────────────────────────┐ │ ││ │ │ Network I/O │ │ ││ │ │ RX (MB/s) ┤ TX (MB/s) ┤ │ │ ││ │ │ ████████ │ ▓▓▓▓▓▓▓▓ │ │ │ ││ │ │ 45.2 │ 23.1 │ │ │ ││ │ └─────────────────────────────────────────────────────────────┘ │ ││ │ │ ││ │ Container: web-01 │ Image: nginx │ Status: Running │ ││ └─────────────────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────────────────┘Hands-on Lab
Section titled “Hands-on Lab”Lab: Set Up Complete Monitoring Stack
Section titled “Lab: Set Up Complete Monitoring Stack”In this hands-on lab, we’ll set up a complete monitoring stack with Prometheus and Grafana.
Prerequisites
Section titled “Prerequisites”- Docker and Docker Compose installed
- At least 2GB RAM available
Lab Steps
Section titled “Lab Steps”# Step 1: Create monitoring directorymkdir -p monitoring && cd monitoring
# Step 2: Create prometheus configurationcat > prometheus.yml << 'EOF'global: scrape_interval: 15s
scrape_configs: - job_name: 'prometheus' static_configs: - targets: ['localhost:9090']
- job_name: 'cadvisor' static_configs: - targets: ['cadvisor:8080']EOF
# Step 3: Create docker-compose.ymlcat > docker-compose.yml << 'EOF'version: '3.8'
services: prometheus: image: prom/prometheus:v2.45.0 volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml - prometheus-data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' ports: - "9090:9090" networks: - monitoring
grafana: image: grafana/grafana:10.0.0 volumes: - grafana-data:/var/lib/grafana environment: - GF_SECURITY_ADMIN_PASSWORD=admin ports: - "3000:3000" networks: - monitoring
cadvisor: image: gcr.io/cadvisor/cadvisor:v0.47.0 volumes: - /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro ports: - "8080:8080" networks: - monitoring
volumes: prometheus-data: grafana-data:
networks: monitoring: driver: bridgeEOF
# Step 4: Start monitoring stackdocker-compose up -d
# Step 5: Verify servicesdocker-compose ps
# Step 6: Access Grafana# Open http://localhost:3000# Username: admin# Password: admin
# Step 7: Add Prometheus data source in Grafana# Configuration → Data Sources → Add data source → Prometheus# URL: http://prometheus:9090
# Step 8: Import dashboard# Use Grafana dashboard ID: 193 - Docker Monitoring
# Step 9: Run a test containerdocker run -d --name test-nginx nginx
# Step 10: View metricscurl http://localhost:8080/metrics | grep container
# Step 11: Clean updocker stop test-nginxdocker rm test-nginxdocker-compose downSummary
Section titled “Summary”Key Takeaways
Section titled “Key Takeaways”- Three Pillars - Logs, metrics, and traces work together
- Centralized Logging - Aggregate logs for analysis
- Real-time Metrics - Monitor CPU, memory, network, disk
- Alert Proactively - Set up alerts before issues become critical
- Use Proper Tools - Prometheus, Grafana, ELK stack
- Dashboards - Visualize metrics for quick understanding
- Distributed Tracing - For microservices, use tracing tools
Quick Reference Commands
Section titled “Quick Reference Commands”# View container logsdocker logs -f <container>
# Container statsdocker stats
# Inspect containerdocker inspect <container>
# Docker daemon logging driverdocker info | grep "Logging Driver"
# Prometheus queriesrate(container_cpu_usage_seconds_total[5m])container_memory_usage_bytes / container_spec_memory_limit_bytesNext Steps
Section titled “Next Steps”In the next chapter, we’ll explore Advanced Docker Networking (Chapter 14), covering:
- Custom networks
- DNS and service discovery
- Load balancing
- Network plugins