Skip to content

Logging


Logging is the foundation of observability. Without logs, debugging production issues is like finding a needle in a haystack.

The Debugging Journey
====================
Without Proper Logging:
──────────────────────
User: "I can't place an order!"
Support: "Can you describe what happened?"
User: "Nothing! The page just doesn't work!"
Dev: *checks code* *guesses* *tries random fix*
Time: 4 hours
Result: Still broken, added more bugs
─────────────────────────────────────────
With Proper Logging:
────────────────────
User: "I can't place an order!"
Dev: *searches logs for user ID*
Found: "PaymentGatewayException: Card declined -
insufficient funds (order-123, user-456)"
Time: 2 minutes
Result: Fixed! Notified user about funds
ScenarioWithout LogsWith Logs
API failing”API is down""POST /checkout returning 500: DB connection timeout”
Slow response”It’s slow""GET /search took 8.2s: Full table scan on users table”
Security issue”Someone did something""Failed login attempt: user@evil.com from IP 1.2.3.4 (50th attempt)“
Data issue”Wrong total""Order total recalculated: 99.99 → 149.99 (coupon expired)“

Understanding and using correct log levels is crucial for effective logging.

Log Level Hierarchy
==================
┌─────────────────────────────────────────────────────────────┐
│ FATAL/CRITICAL - System crash, data loss imminent │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • Database connection permanently lost │
│ • Disk full │
│ • Uncaught exception causing service crash │
│ Color: RED │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ ERROR - Operation failed but system continues │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • API call to payment service failed │
│ • Database query failed │
│ • File not found │
│ Color: RED │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ WARN - Unexpected situation, may need attention │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • Using deprecated API │
│ • Retry attempt failed │
│ • Config using default value │
│ Color: YELLOW │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ INFO - Normal operation events │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • Server started on port 8080 │
│ • User logged in successfully │
│ • Order created: order-123 │
│ Color: GREEN/DEFAULT │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DEBUG - Detailed debugging information │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • Function called with params: {a: 1, b: 2} │
│ • Loop iteration: i=5, value=abc │
│ • Cache miss for key: user:123 │
│ Color: GRAY │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ TRACE - Ultra-verbose, detailed flow │
│ ─────────────────────────────────────────────────────────│
│ Examples: │
│ • Entering function X │
│ • Exiting function Y │
│ • HTTP request: GET /api/users (10ms) │
│ Color: GRAY │
└─────────────────────────────────────────────────────────────┘
# Examples in Python
# FATAL - System can't continue
try:
db.connect()
except DatabaseUnavailable as e:
# Can't operate without DB - this is FATAL
logger.fatal(f"Cannot start: Database unavailable: {e}")
sys.exit(1)
# ERROR - Operation failed
try:
payment = stripe.charge(card, amount)
except StripeError as e:
# Operation failed, but we can handle it
logger.error(f"Payment failed for order {order_id}: {e}")
return ErrorResponse("Payment failed")
# WARN - Potential issue
if len(results) > 10000:
# This might be a performance issue
logger.warn(f"Large result set: {len(results)} rows
(consider pagination)")
# INFO - Normal operations
logger.info(f"User {user_id} created order {order_id}")
logger.info(f"Server started on port {PORT}")
# DEBUG - Debugging info
logger.debug(f"Processing request: {request_id},
params: {params}")
logger.debug(f"Cache hit: user_profile:{user_id}")
# TRACE - Detailed flow
logger.trace(f"Entering validate_order(), order_id={order_id}")

Structured logging transforms human-readable text into machine-parseable data.

Bad (Plain Text):
=================
2024-01-15 10:30:00 ERROR User login failed for john@example.com
from IP 1.2.3.4 - Invalid password
Problems:
• Can't search by user ID efficiently
• Can't filter by IP
• Can't parse error codes
• Can't build dashboards
─────────────────────────────────────────
Good (JSON Structured):
======================
{
"timestamp": "2024-01-15T10:30:00.000Z",
"level": "ERROR",
"logger": "auth-service",
"message": "User login failed",
"context": {
"userId": "user-123",
"email": "john@example.com",
"ip": "1.2.3.4",
"userAgent": "Mozilla/5.0...",
"reason": "invalid_password",
"attemptNumber": 2
},
"traceId": "abc-123-def",
"spanId": "span-456"
}
Benefits:
✓ Search by any field
✓ Filter by level, user, IP
✓ Build metrics dashboards
✓ Correlate with traces
{
"timestamp": "2024-01-15T10:30:00.000Z",
"level": "ERROR",
"message": "Order payment failed",
"service": "order-service",
"version": "1.2.3",
"environment": "production",
"context": {
"orderId": "ord-123",
"userId": "user-456",
"amount": 99.99,
"currency": "USD"
},
"error": {
"type": "PaymentDeclined",
"code": "INSUFFICIENT_FUNDS",
"message": "Card declined"
},
"trace": {
"traceId": "abc-123",
"spanId": "xyz-456",
"parentSpanId": "par-789"
}
}
# Python with structlog
import structlog
import logging
structlog.configure(
processors=[
structlog.stdlib.filter_by_level,
structlog.stdlib.add_logger_name,
structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.processors.JSONRenderer()
],
wrapper_class=structlog.stdlib.BoundLogger,
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
)
logger = structlog.get_logger()
# Usage
logger.info(
"order_created",
order_id="ord-123",
user_id="user-456",
amount=99.99
)
// Node.js with Pino
const pino = require('pino');
const logger = pino({
level: 'info',
timestamp: () => `,"timestamp":"${new Date().toISOString()}"`,
formatters: {
level: (label) => ({ level: label })
}
});
// Usage
logger.info({
orderId: 'ord-123',
userId: 'user-456',
amount: 99.99
}, 'Order created');
// Java with SLF4J + Logback
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
public class OrderService {
private static final Logger logger = LoggerFactory.getLogger(OrderService.class);
public void createOrder(Order order) {
try {
MDC.put("orderId", order.getId());
MDC.put("userId", order.getUserId());
logger.info("Order created: amount={} currency={}",
order.getAmount(),
order.getCurrency());
} finally {
MDC.clear();
}
}
}

# Good request logging
logger.info("api_request",
method="POST",
path="/api/orders",
query={"status": "pending"},
user_id=user_id,
request_id=request_id,
content_length=1024
)
# Response logging
logger.info("api_response",
method="POST",
path="/api/orders",
status=201,
duration_ms=150,
response_order_id="ord-123"
)
# Always include context!
try:
result = process_payment(order)
except PaymentException as e:
logger.error("payment_failed",
order_id=order.id,
amount=order.amount,
error_code=e.code,
error_message=str(e),
user_id=order.user_id,
card_last_four=order.card_last_four,
# Don't log: full card number, CVV, password!
exc_info=True # Includes stack trace
)
raise
# Log important business events
logger.info("user_registered",
user_id=user.id,
email=user.email,
referrer=user.referrer,
signup_method="google"
)
logger.info("order_placed",
order_id=order.id,
user_id=order.user_id,
item_count=len(order.items),
total=order.total,
coupon_used=order.coupon_code
)
logger.info("subscription_created",
user_id=user.id,
plan="premium",
billing_cycle="monthly",
amount=19.99
)

CRITICAL: Never Log These!
=========================
┌─────────────────────────────────────────────────────────────┐
│ PASSWORD/SECRETS │
│ ─────────────────────────────────────────────────────────│
│ ✗ User passwords │
│ ✗ API keys and tokens │
│ ✗ OAuth secrets │
│ ✗ Database connection strings │
│ ✗ Private keys │
│ │
│ ✓ Log: "Login attempt" (without password) │
│ ✓ Log: "API call failed" (without keys) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ PERSONAL IDENTIFIABLE INFORMATION (PII) │
│ ─────────────────────────────────────────────────────────│
│ ✗ Full credit card numbers │
│ ✗ Social Security Numbers │
│ ✗ Phone numbers (unless necessary) │
│ ✗ Home addresses │
│ ✗ Medical information │
│ │
│ ✓ Log: "Payment processed" (not card number) │
│ ✓ Log: User ID (not full name if not needed) │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ SENSITIVE BUSINESS DATA │
│ ─────────────────────────────────────────────────────────│
│ ✗ Full request/response bodies (can contain secrets) │
│ ✗ Customer data in bulk │
│ ✗ Internal system details that aid attackers │
└─────────────────────────────────────────────────────────────┘
import re
def mask_sensitive_data(message: str) -> str:
"""Mask sensitive data in log messages"""
# Mask credit cards
message = re.sub(r'\b\d{13,16}\b', '****-****-****-****', message)
# Mask SSN
message = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '***-**-****', message)
# Mask API keys (common patterns)
message = re.sub(r'(api[_-]?key["\']?\s*[:=]\s*["\']?)\w+', r'\1***', message)
return message

Log Aggregation Pipeline
========================
┌─────────────────────────────────────────────────────────────┐
│ Application Servers │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Server 1 │ │ Server 2 │ │ Server N │ │
│ │ app.log │ │ app.log │ │ app.log │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
└───────┼─────────────┼─────────────┼────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ Log Shippers (Filebeat, Fluentd, CloudWatch Agent) │
│ ─────────────────────────────────────────────────────────│
│ • Read log files │
│ • Parse/transform │
│ • Send to centralized system │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Log Storage & Analysis │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ ELK │ │ Splunk │ │ CloudWatch │ │
│ │ Stack │ │ │ │ Logs │ │
│ │(Elastic) │ │ │ │ │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ • Search & Query │
│ • Dashboards │
│ • Alerts │
└─────────────────────────────────────────────────────────────┘
ToolTypeProsCons
ELK StackOpen SourceFree, flexibleComplex setup
SplunkCommercialPowerful, enterpriseExpensive
CloudWatchAWS ManagedNative AWSAWS-only
DataDogSaaSGreat UICost
Sumo LogicSaaSSecurity focusPricing
Grafana + LokiOpen SourceCost effectiveNewer

Logging Best Practices Checklist
===============================
✓ Use structured JSON logging
✓ Include consistent context (request ID, user ID)
✓ Use appropriate log levels
✓ Log errors with full context and stack traces
✓ Never log passwords, secrets, or PII
✓ Implement log rotation
✓ Centralize logs for searching
✓ Set up alerts on error thresholds
✓ Monitor log volume to control costs
✓ Use correlation IDs for request tracing
─────────────────────────────────────────
Log Volume Guidelines:
─────────────────────
• INFO: Expected operations
• WARN: Things that need attention
• ERROR: Failures (always)
• DEBUG: Enable in dev, selective in prod
Target: 100MB - 1GB per instance per day
(varies by application)

  1. Log levels - FATAL > ERROR > WARN > INFO > DEBUG > TRACE
  2. Structured logging - Use JSON format
  3. Context - Always include request ID, user ID
  4. Security - Never log secrets or PII
  5. Centralize - Aggregate in ELK, Splunk, or cloud service
  6. Correlation - Use trace/request ID to track requests
  7. Rotation - Prevent disk overflow

Next: Chapter 27: Monitoring & Metrics