Skip to content

Devops_best_practices

Chapter 96: DevOps and SysAdmin Best Practices

Section titled “Chapter 96: DevOps and SysAdmin Best Practices”

Production System Administration Guidelines

Section titled “Production System Administration Guidelines”

DevOps Principles
+------------------------------------------------------------------+
| |
| 1. Automate Everything |
| +----------------------------------------------------------+ |
| | • Manual processes are error-prone | |
| | • Scripts for all repetitive tasks | |
| | • Configuration management (Ansible, Puppet, Chef) | |
| +----------------------------------------------------------+ |
| |
| 2. Idempotent Configurations |
| +----------------------------------------------------------+ |
| | • Running multiple times produces same result | |
| | • Ansible: idempotent by design | |
| +----------------------------------------------------------+ |
| |
| 3. Infrastructure as Code |
| +----------------------------------------------------------+ |
| | • Version control infrastructure | |
| | • GitOps workflow | |
| | • Declarative definitions | |
| +----------------------------------------------------------+ |
| |
| 4. Immutable Infrastructure |
| +----------------------------------------------------------+ |
| | • Don't modify running systems | |
| | • Replace with new versions | |
| | • Containers, golden images | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# Infrastructure Metrics
- CPU usage (per core, overall)
- Memory usage (used, free, cached, swap)
- Disk I/O (IOPS, throughput, latency)
- Network (bandwidth, packets, errors)
- Disk space usage
# Application Metrics
- Request rate (RPM/RPS)
- Response time (p50, p95, p99)
- Error rate (5xx, exceptions)
- Active connections
- Queue depth
# Business Metrics
- User signups
- Transactions
- Revenue
- API calls
Alert Best Practices
+------------------------------------------------------------------+
| |
| 1. Signal-to-Noise Ratio |
| +----------------------------------------------------------+ |
| | • Only alert on actionable issues | |
| | • Avoid alert fatigue | |
| +----------------------------------------------------------+ |
| |
| 2. Severity Levels |
| +----------------------------------------------------------+ |
| | • Critical (immediate action) | |
| | • Warning (investigate soon) | |
| | • Info (no action needed) | |
| +----------------------------------------------------------+ |
| |
| 3. Runbooks |
| +----------------------------------------------------------+ |
| | • Document how to respond to each alert | |
| | • Include escalation paths | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Security Layers
+------------------------------------------------------------------+
| |
| 1. Network Security |
| +----------------------------------------------------------+ |
| | • Firewalls (host, network) | |
| | • Segmentation (VPCs, VLANs) | |
| | • WAF for web applications | |
| +----------------------------------------------------------+ |
| |
| 2. System Hardening |
| +----------------------------------------------------------+ |
| | • Principle of least privilege | |
| | • Regular patching and updates | |
| | • Disable unnecessary services | |
| +----------------------------------------------------------+ |
| |
| 3. Data Security |
| +----------------------------------------------------------+ |
| | • Encryption at rest | |
| | • Encryption in transit (TLS) | |
| | • Key management (secrets, vault) | |
| +----------------------------------------------------------+ |
| |
| 4. Monitoring and Response |
| +----------------------------------------------------------+ |
| | • Audit logging | |
| | • Intrusion detection | |
| | • Incident response plan | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Backup Strategy
+------------------------------------------------------------------+
| |
| 3 Copies of data |
| 2 Different storage types |
| 1 Offsite copy |
| |
| Testing: |
| +----------------------------------------------------------+ |
| | • Test restores regularly | |
| | • Document recovery procedures | |
| | • Automate recovery testing | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Terminal window
# System Documentation
- Architecture diagrams
- Network topology
- IP addressing scheme
- Service dependencies
# Runbooks
- Deployment procedures
- Troubleshooting guides
- Emergency contacts
- Rollback procedures
# Configuration
- All configurations
- Why changes were made
- Approval records

  1. What is Infrastructure as Code?

    • Managing infrastructure through code
  2. What is the 3-2-1 backup rule?

    • 3 copies, 2 media types, 1 offsite
  3. What is principle of least privilege?

    • Only minimum access needed

Quick Reference
+------------------------------------------------------------------+
| |
| Key Principles: |
| +----------------------------------------------------------+ |
| | Automate everything | |
| | Monitor proactively | |
| | Security in depth | |
| | Test backups regularly | |
| | Document everything | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+