Practical_exercises
Chapter 46: Practical Exercises
Section titled “Chapter 46: Practical Exercises”Overview
Section titled “Overview”This chapter contains hands-on exercises for practicing bash scripting skills needed for DevOps, SRE, and SysAdmin positions. Each exercise includes requirements, hints, and complete solutions with production-ready code.
Exercise 1: System Monitoring Dashboard
Section titled “Exercise 1: System Monitoring Dashboard”Requirements
Section titled “Requirements”Create a comprehensive system monitoring script that:
- Monitors CPU, Memory, Disk usage
- Checks critical services status
- Monitors network connectivity
- Generates alerts when thresholds exceeded
- Outputs formatted report
- Can run continuously or once
- Supports JSON output for integration with monitoring systems
- Tracks historical data
- Use
top,free,dffor metrics - Use
systemctlfor service checks - Use
pingorcurlfor network - Use
whileloop for continuous mode - Use colors for terminal output
- Use
#!/usr/bin/env bashfor portability - Use
set -euo pipefailfor safety
Solution
Section titled “Solution”#!/usr/bin/env bash## system-monitor.sh - Comprehensive system monitoring for production environments## Usage: system-monitor.sh [OPTIONS]# -c, --continuous Run continuously# -i, --interval N Interval in seconds (default: 60)# -j, --json Output in JSON format# -h, --help Show this help## Author: DevOps Engineer# Version: 1.0.0#
set -euo pipefail
# ============================================================================# Configuration - Modify these values for your environment# ============================================================================
# Thresholds (percent)readonly THRESHOLD_CPU=80readonly THRESHOLD_MEM=85readonly THRESHOLD_DISK=90readonly THRESHOLD_LOAD=2.0
# Directoriesreadonly STATE_DIR="/var/lib/system-monitor"readonly LOG_DIR="/var/log"
# Services to monitor (add your critical services here)readonly SERVICES=( "nginx" "docker" "sshd" "postgresql" "redis")
# Network hosts to checkreadonly NETWORK_HOSTS=( "8.8.8.8" "1.1.1.1")
# Email for alerts (set empty to disable)readonly ALERT_EMAIL="ops@example.com"
# ============================================================================# Colors and Formatting# ============================================================================
# Terminal colorsreadonly COLOR_RED='\033[0;31m'readonly COLOR_GREEN='\033[0;32m'readonly COLOR_YELLOW='\033[1;33m'readonly COLOR_BLUE='\033[0;34m'readonly COLOR_CYAN='\033[0;36m'readonly COLOR_BOLD='\033[1m'readonly COLOR_NC='\033[0m' # No Color
# ============================================================================# Global Variables# ============================================================================
OUTPUT_JSON=falseCONTINUOUS_MODE=falseINTERVAL=60
# ============================================================================# Utility Functions# ============================================================================
# Print colored outputprint_color() { local color="$1" local message="$2" echo -e "${color}${message}${COLOR_NC}"}
# Log messageslog() { local level="$1" shift local message="[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" echo "$message" | tee -a "${LOG_DIR}/system-monitor.log"}
# Send alert emailsend_alert() { local subject="$1" local body="$2"
if [[ -n "$ALERT_EMAIL" ]] && command -v mail &>/dev/null; then echo "$body" | mail -s "[ALERT] $subject" "$ALERT_EMAIL" fi}
# Check if running as rootcheck_root() { if [[ $EUID -ne 0 ]]; then log "WARN" "Not running as root. Some checks may fail." fi}
# Initialize directoriesinit_directories() { mkdir -p "$STATE_DIR" "$LOG_DIR" chmod 755 "$STATE_DIR"}
# ============================================================================# Metric Collection Functions# ============================================================================
# Get CPU usage percentageget_cpu_usage() { # Method 1: Using top (most compatible) local cpu_idle cpu_idle=$(top -bn2 -d 0.5 | grep "Cpu(s)" | tail -1 | awk '{print $8}' | cut -d'%' -f1)
if [[ -n "$cpu_idle" ]]; then echo "100 - $cpu_idle" | bc -l return fi
# Method 2: Using /proc/stat (Linux specific) if [[ -f /proc/stat ]]; then local cpu_line cpu_line=$(head -1 /proc/stat) local user nice system idle iowait irq softirq read -r user nice system idle iowait irq softirq <<< "$cpu_line"
local total idle_total total=$((user + nice + system + idle + iowait + irq + softirq)) idle_total=$((idle + iowait))
echo "scale=2; ($total - $idle_total) * 100 / $total" | bc return fi
echo "0"}
# Get memory usage percentageget_memory_usage() { local mem_total mem_used mem_available
if command -v free &>/dev/null; then mem_total=$(free -m | awk 'NR==2{print $2}') mem_available=$(free -m | awk 'NR==2{print $7}') mem_used=$((mem_total - mem_available)) echo "scale=2; $mem_used * 100 / $mem_total" else echo "0" | bc fi}
# Get detailed memory infoget_memory_info() { free -h | awk 'NR==1{next} {print $1 ": " $2 " used / " $3 " free"}'}
# Get disk usage for a pathget_disk_usage() { local path="${1:-/}" df -h "$path" | tail -1 | awk '{print $5}' | tr -d '%'}
# Get disk info for all mounted filesystemsget_disk_info() { df -h | grep -E '^/dev/' | awk '{printf "%s: %s used of %s (%s)\n", $1, $3, $2, $5}'}
# Get load averageget_load_average() { uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//'}
# Get number of CPU coresget_cpu_cores() { nproc 2>/dev/null || grep -c ^processor /proc/cpuinfo}
# ============================================================================# Service Checking Functions# ============================================================================
# Check if a service is runningcheck_service() { local service="$1"
if ! command -v systemctl &>/dev/null; then # Fallback for systems without systemd if pgrep -x "$service" &>/dev/null; then echo "running" return 0 else echo "stopped" return 1 fi fi
if systemctl is-active --quiet "$service" 2>/dev/null; then echo "running" return 0 else local failed_reason failed_reason=$(systemctl status "$service" 2>&1 | grep -i "failed" | head -1) echo "stopped: ${failed_reason:-unknown reason}" return 1 fi}
# Check all configured servicescheck_all_services() { local failed=0 local services_json="["
echo "" print_color "$COLOR_BOLD" "=== Service Status ===" echo ""
for service in "${SERVICES[@]}"; do local status status=$(check_service "$service" 2>&1)
if [[ "$status" == "running" ]]; then print_color "$COLOR_GREEN" " ✓ $service: running" services_json+="{\"name\":\"$service\",\"status\":\"running\"}," else print_color "$COLOR_RED" " ✗ $service: $status" services_json+="{\"name\":\"$service\",\"status\":\"$status\"}," ((failed++)) fi done
services_json="${services_json%,}]"
if [[ $failed -gt 0 ]]; then send_alert "Service Down" "$failed service(s) are not running" fi
echo "$services_json"}
# ============================================================================# Network Checking Functions# ============================================================================
# Check network connectivitycheck_network() { local failed=0 local results_json="["
echo "" print_color "$COLOR_BOLD" "=== Network Connectivity ===" echo ""
for host in "${NETWORK_HOSTS[@]}"; do local result if ping -c 1 -W 2 "$host" &>/dev/null; then print_color "$COLOR_GREEN" " ✓ $host: reachable" results_json+="{\"host\":\"$host\",\"status\":\"ok\"}," else print_color "$COLOR_RED" " ✗ $host: unreachable" results_json+="{\"host\":\"$host\",\"status\":\"unreachable\"}," ((failed++)) fi done
# Check DNS local dns_result if host google.com &>/dev/null; then print_color "$COLOR_GREEN" " ✓ DNS: working" results_json+="{\"host\":\"dns\",\"status\":\"ok\"}," else print_color "$COLOR_RED" " ✗ DNS: not working" results_json+="{\"host\":\"dns\",\"status\":\"failed\"}," ((failed++)) fi
results_json="${results_json%,}]"
if [[ $failed -gt 0 ]]; then send_alert "Network Issue" "$failed network check(s) failed" fi
echo "$results_json"}
# ============================================================================# Process Information Functions# ============================================================================
# Get top processes by CPUget_top_cpu_processes() { echo "" print_color "$COLOR_BOLD" "=== Top 5 Processes by CPU ===" echo "" ps aux --sort=-%cpu | head -6 | tail -5 | \ awk '{printf " %-20s %5s%% CPU\n", $11, $3}'}
# Get top processes by memoryget_top_mem_processes() { echo "" print_color "$COLOR_BOLD" "=== Top 5 Processes by Memory ===" echo "" ps aux --sort=-%mem | head -6 | tail -5 | \ awk '{printf " %-20s %5s%% MEM\n", $11, $4}'}
# Get process countget_process_count() { local total running sleeping stopped zombie
total=$(ps aux | wc -l) running=$(ps -eo state= | grep -c R) sleeping=$(ps -eo state= | grep -c S) stopped=$(ps -eo state= | grep -c T) zombie=$(ps -eo state= | grep -c Z)
echo "Total: $total | Running: $running | Sleeping: $sleeping | Stopped: $stopped | Zombie: $zombie"}
# ============================================================================# System Information Functions# ============================================================================
# Get system uptimeget_uptime() { uptime -p 2>/dev/null || uptime | awk '{print $3,$4}' | tr -d ','}
# Get logged in usersget_logged_users() { who | awk '{print $1, $2, $3}' | sort -u}
# ============================================================================# Alert Functions# ============================================================================
# Check thresholds and alertcheck_thresholds() { local cpu_usage="$1" local mem_usage="$2" local disk_usage="$3" local load_avg="$4" local cores="$5"
local alerts=()
# CPU check if (( $(echo "$cpu_usage > $THRESHOLD_CPU" | bc -l) )); then alerts+=("CPU usage is ${cpu_usage}% (threshold: ${THRESHOLD_CPU}%)") fi
# Memory check if (( $(echo "$mem_usage > $THRESHOLD_MEM" | bc -l) )); then alerts+=("Memory usage is ${mem_usage}% (threshold: ${THRESHOLD_MEM}%)") fi
# Disk check if [[ "$disk_usage" -gt "$THRESHOLD_DISK" ]]; then alerts+=("Disk usage is ${disk_usage}% (threshold: ${THRESHOLD_DISK}%)") fi
# Load check (load > cores * threshold) local max_load max_load=$(echo "$cores * $THRESHOLD_LOAD" | bc) if (( $(echo "$load_avg > $max_load" | bc -l) )); then alerts+=("Load average is ${load_avg} (threshold: ${max_load} for ${cores} cores)") fi
# Send alert if any threshold exceeded if [[ ${#alerts[@]} -gt 0 ]]; then local alert_message alert_message=$(printf "%s\n" "${alerts[@]}") send_alert "System Threshold Exceeded" "$alert_message" fi
# Return alerts as JSON array local alerts_json="[" for alert in "${alerts[@]}"; do alerts_json+="\"$alert\"," done alerts_json="${alerts_json%,}]" echo "$alerts_json"}
# ============================================================================# Output Functions# ============================================================================
# Generate JSON outputgenerate_json_output() { local cpu_usage="$1" local mem_usage="$2" local disk_usage="$3" local load_avg="$4" local cores="$5" local uptime="$6" local alerts="$7" local services="$8" local network="$9"
cat <<EOF{ "timestamp": "$(date -Iseconds)", "hostname": "$(hostname)", "metrics": { "cpu": { "usage_percent": $cpu_usage, "cores": $cores }, "memory": { "usage_percent": $mem_usage }, "disk": { "root_usage_percent": $disk_usage }, "load_average": $load_avg, "uptime": "$uptime" }, "alerts": $alerts, "services": $services, "network": $network}EOF}
# Print human-readable outputprint_human_output() { local cpu_usage="$1" local mem_usage="$2" local disk_usage="$3" local load_avg="$4" local cores="$5" local uptime="$6"
echo "" print_color "$COLOR_BOLD" "╔══════════════════════════════════════════════════════════╗" print_color "$COLOR_BOLD" "║ System Monitor - $(date '+%Y-%m-%d %H:%M:%S') ║" print_color "$COLOR_BOLD" "╚══════════════════════════════════════════════════════════╝" echo ""
print_color "$COLOR_BOLD" "=== System Information ===" echo " Hostname: $(hostname)" echo " Uptime: $uptime" echo " CPU Cores: $cores" echo ""
print_color "$COLOR_BOLD" "=== Resource Usage ==="
# CPU local cpu_color="$COLOR_GREEN" if (( $(echo "$cpu_usage > $THRESHOLD_CPU" | bc -l) )); then cpu_color="$COLOR_RED" elif (( $(echo "$cpu_usage > $((THRESHOLD_CPU / 2))" | bc -l) )); then cpu_color="$COLOR_YELLOW" fi printf " CPU Usage: ${cpu_color}%6.1f%%${COLOR_NC} (threshold: %d%%)\n" "$cpu_usage" "$THRESHOLD_CPU"
# Memory local mem_color="$COLOR_GREEN" if (( $(echo "$mem_usage > $THRESHOLD_MEM" | bc -l) )); then mem_color="$COLOR_RED" elif (( $(echo "$mem_usage > $((THRESHOLD_MEM / 2))" | bc -l) )); then mem_color="$COLOR_YELLOW" fi printf " Memory Usage: ${mem_color}%6.1f%%${COLOR_NC} (threshold: %d%%)\n" "$mem_usage" "$THRESHOLD_MEM"
# Disk local disk_color="$COLOR_GREEN" if [[ "$disk_usage" -gt "$THRESHOLD_DISK" ]]; then disk_color="$COLOR_RED" elif [[ "$disk_usage" -gt $((THRESHOLD_DISK / 2)) ]]; then disk_color="$COLOR_YELLOW" fi printf " Disk Usage: ${disk_color}%6s${COLOR_NC} (threshold: %d%%)\n" "${disk_usage}%" "$THRESHOLD_DISK"
# Load local max_load max_load=$(echo "$cores * $THRESHOLD_LOAD" | bc) local load_color="$COLOR_GREEN" if (( $(echo "$load_avg > $max_load" | bc -l) )); then load_color="$COLOR_RED" fi printf " Load Average: ${load_color}%6s${COLOR_NC} (threshold: %.1f for %d cores)\n" "$load_avg" "$max_load" "$cores"
echo "" echo " Memory Details:" get_memory_info | while read -r line; do echo " $line" done
echo "" echo " Disk Details:" get_disk_info | while read -r line; do echo " $line" done
get_top_cpu_processes get_top_mem_processes
echo "" print_color "$COLOR_BOLD" "=== Process Information ===" echo " $(get_process_count)" echo ""
print_color "$COLOR_BOLD" "=== Logged In Users ===" get_logged_users | while read -r user; do echo " $user" done echo ""}
# ============================================================================# Main Functions# ============================================================================
# Run all checksrun_checks() { # Collect metrics local cpu_usage cpu_usage=$(get_cpu_usage) local mem_usage mem_usage=$(get_memory_usage) local disk_usage disk_usage=$(get_disk_usage) local load_avg load_avg=$(get_load_average) local cores cores=$(get_cpu_cores) local uptime uptime=$(get_uptime)
# Run checks local services_json services_json=$(check_all_services) local network_json network_json=$(check_network)
# Check thresholds and get alerts local alerts_json alerts_json=$(check_thresholds "$cpu_usage" "$mem_usage" "$disk_usage" "$load_avg" "$cores")
# Output if [[ "$OUTPUT_JSON" == "true" ]]; then generate_json_output "$cpu_usage" "$mem_usage" "$disk_usage" \ "$load_avg" "$cores" "$uptime" "$alerts_json" "$services_json" "$network_json" else print_human_output "$cpu_usage" "$mem_usage" "$disk_usage" \ "$load_avg" "$cores" "$uptime" fi}
# Parse command line argumentsparse_arguments() { while [[ $# -gt 0 ]]; do case $1 in -c|--continuous) CONTINUOUS_MODE=true shift ;; -i|--interval) INTERVAL="$2" shift 2 ;; -j|--json) OUTPUT_JSON=true shift ;; -h|--help) echo "Usage: $0 [OPTIONS]" echo "" echo "Options:" echo " -c, --continuous Run continuously" echo " -i, --interval N Interval in seconds (default: 60)" echo " -j, --json Output in JSON format" echo " -h, --help Show this help" exit 0 ;; *) echo "Unknown option: $1" exit 1 ;; esac done}
# ============================================================================# Main Entry Point# ============================================================================
main() { parse_arguments "$@" check_root init_directories
log "INFO" "Starting system monitor"
if [[ "$CONTINUOUS_MODE" == "true" ]]; then log "INFO" "Running in continuous mode (interval: ${INTERVAL}s)" while true; do run_checks sleep "$INTERVAL" done else run_checks fi
log "INFO" "System monitor completed"}
main "$@"Running the Monitor
Section titled “Running the Monitor”# Make executablechmod +x system-monitor.sh
# Run once (human-readable output)./system-monitor.sh
# Run once (JSON output)./system-monitor.sh --json
# Run continuously (every 60 seconds)./system-monitor.sh --continuous
# Run continuously with custom interval./system-monitor.sh --continuous --interval 30Exercise 2: Backup Automation Script
Section titled “Exercise 2: Backup Automation Script”Requirements
Section titled “Requirements”Create a backup script that:
- Backs up specified directories
- Creates timestamped archives
- Implements incremental backups using snapshots
- Verifies backup integrity with checksums
- Cleans up old backups based on retention policy
- Sends notifications on completion/failure
- Supports both local and remote backup (via SSH/rsync)
- Provides detailed logging
- Use
tarfor archiving - Use
rsyncfor incremental - Use
md5sumorsha256sumfor verification - Use
sshwith key-based authentication for remote backup - Use
findwith-mtimefor cleanup - Use
loggerfor system logging
Solution
Section titled “Solution”#!/usr/bin/env bash## backup-manager.sh - Enterprise backup automation solution## Features:# - Full and incremental backups# - Local and remote backup support# - Integrity verification# - Automatic cleanup# - Email notifications# - Comprehensive logging## Usage:# backup-manager.sh backup [source...] Create backup# backup-manager.sh restore <backup> <path> Restore backup# backup-manager.sh list List available backups# backup-manager.sh verify <backup> Verify backup integrity# backup-manager.sh cleanup Remove old backups## Author: DevOps Engineer# Version: 2.0.0#
set -euo pipefail
# ============================================================================# Configuration# ============================================================================
# Directoriesreadonly BACKUP_ROOT="/backup"readonly STATE_DIR="${BACKUP_ROOT}/.state"readonly LOG_DIR="/var/log"
# Backup settingsreadonly COMPRESSION="gzip" # gzip, bzip2, xz, or nonereadonly BACKUP_USER="backup"readonly RETENTION_DAYS=30readonly INCREMENTAL_RETENTION_DAYS=7
# Remote backup settings (optional)readonly REMOTE_HOST=""readonly REMOTE_USER="backup"readonly REMOTE_PATH="/backup"
# Email settingsreadonly ALERT_EMAIL="ops@example.com"readonly BACKUP_EMAIL="backup-reports@example.com"
# Default sources if not providedDEFAULT_SOURCES=( "/etc" "/home" "/var/www" "/opt")
# ============================================================================# Global Variables# ============================================================================
DRY_RUN=falseVERBOSE=falsePARALLEL_JOBS=4
# ============================================================================# Utility Functions# ============================================================================
log() { local level="$1" shift local message="[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*" echo "$message" | tee -a "${LOG_DIR}/backup.log"}
log_verbose() { if [[ "$VERBOSE" == "true" ]]; then log "DEBUG" "$*" fi}
error() { log "ERROR" "$*" send_notification "ERROR" "$*"}
notify() { local subject="$1" local body="$2"
if [[ -n "$ALERT_EMAIL" ]] && command -v mailx &>/dev/null; then echo "$body" | mailx -s "[BACKUP] $subject" "$ALERT_EMAIL" fi}
send_notification() { local status="$1" local message="$2" notify "$status" "$message"}
# Check if command existsrequire_command() { local cmd="$1" if ! command -v "$cmd" &>/dev/null; then error "Required command not found: $cmd" exit 1 fi}
# ============================================================================# Setup Functions# ============================================================================
setup_directories() { log "Setting up backup directories..."
mkdir -p "$BACKUP_ROOT" mkdir -p "$STATE_DIR" mkdir -p "$LOG_DIR"
# Set permissions chmod 700 "$BACKUP_ROOT" chmod 700 "$STATE_DIR"
log "Backup directories created at $BACKUP_ROOT"}
# ============================================================================# Backup Functions# ============================================================================
# Get compression optionsget_compression_args() { case "$COMPRESSION" in gzip) echo "-z" ;; bzip2) echo "-j" ;; xz) echo "-J" ;; none) echo "" ;; *) echo "-z" ;; esac}
# Get compression extensionget_compression_ext() { case "$COMPRESSION" in gzip) echo ".gz" ;; bzip2) echo ".bz2" ;; xz) echo ".xz" ;; none) echo "" ;; *) echo ".gz" ;; esac}
# Get file extensionget_backup_ext() { local ext ext=$(get_compression_ext) echo ".tar${ext}"}
# Create incremental backupcreate_incremental_backup() { local source="$1" local timestamp="$2" local backup_type="$3"
local source_name source_name=$(basename "$source") local backup_dir="${BACKUP_ROOT}/${source_name}" local snapshot_file="${STATE_DIR}/${source_name}.snapshot"
mkdir -p "$backup_dir"
local backup_file="${backup_dir}/${source_name}.${timestamp}"
log "Creating $backup_type backup of $source"
local tar_args tar_args=$(get_compression_args)
if [[ "$backup_type" == "full" ]]; then # Full backup - create new snapshot tar --listed-incremental=/dev/null $tar_args -cf "${backup_file}.full.tar.gz" -C "$(dirname "$source")" "$(basename "$source")" 2>/dev/null || { error "Failed to create full backup of $source" return 1 }
# Create initial snapshot find "$source" -type f -printf "%p\n" 2>/dev/null | sort > "${snapshot_file}.new"
else # Incremental backup if [[ ! -f "$snapshot_file" ]]; then log "No snapshot found, creating full backup instead" create_incremental_backup "$source" "$timestamp" "full" return $? fi
tar --listed-incremental="$snapshot_file" $tar_args -cf "${backup_file}.incr.tar.gz" -C "$(dirname "$source")" "$(basename "$source")" 2>/dev/null || { error "Failed to create incremental backup of $source" return 1 } fi
# Calculate checksum local checksum_file="${backup_file}.sha256" sha256sum "${backup_file}.tar.gz" > "$checksum_file"
# Verify backup if tar -tzf "${backup_file}.tar.gz" &>/dev/null; then local size size=$(du -h "${backup_file}.tar.gz" | cut -f1) log "Backup created successfully: ${backup_file}.tar.gz ($size)" return 0 else error "Backup verification failed" rm -f "${backup_file}.tar.gz" "$checksum_file" return 1 fi}
# Create full backup (simpler version)create_full_backup() { local source="$1" local timestamp="$2"
if [[ ! -d "$source" ]]; then error "Source directory does not exist: $source" return 1 fi
local source_name source_name=$(basename "$source") local backup_dir="${BACKUP_ROOT}/${source_name}" local backup_file="${backup_dir}/${source_name}.${timestamp}"
mkdir -p "$backup_dir"
log "Creating full backup of $source"
# Get compression args local tar_args tar_args=$(get_compression_args) local ext ext=$(get_compression_ext)
# Create archive tar $tar_args -cf "${backup_file}${ext}" -C "$(dirname "$source")" "$(basename "$source")" || { error "Failed to create backup archive" return 1 }
# Calculate checksum local checksum_file="${backup_file}${ext}.sha256" sha256sum "${backup_file}${ext}" > "$checksum_file"
# Create manifest local manifest="${backup_file}${ext}.manifest" { echo "Backup: ${backup_file}${ext}" echo "Source: $source" echo "Date: $(date)" echo "Hostname: $(hostname)" echo "" echo "Files:" tar $tar_args -tf "${backup_file}${ext}" | wc -l } > "$manifest"
# Verify backup if tar $tar_args -tf "${backup_file}${ext}" &>/dev/null; then local size size=$(du -h "${backup_file}${ext}" | cut -f1) log "Backup created successfully: ${backup_file}${ext} ($size)" return 0 else error "Backup verification failed" rm -f "${backup_file}${ext}" "$checksum_file" "$manifest" return 1 fi}
# Backup to remote servercreate_remote_backup() { local source="$1"
if [[ -z "$REMOTE_HOST" ]]; then error "Remote host not configured" return 1 fi
require_command "rsync"
log "Creating remote backup of $source to ${REMOTE_HOST}:${REMOTE_PATH}"
local source_name source_name=$(basename "$source")
rsync -avz --progress \ -e "ssh -o StrictHostKeyChecking=yes" \ --delete \ "$source/" "${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_PATH}/${source_name}/" || { error "Remote backup failed" return 1 }
log "Remote backup completed" return 0}
# ============================================================================# Restore Functions# ============================================================================
# List available backupslist_backups() { echo "=== Available Backups ===" echo ""
if [[ ! -d "$BACKUP_ROOT" ]]; then echo "No backups found" return fi
echo "Backup Directory: $BACKUP_ROOT" echo ""
for backup_dir in "$BACKUP_ROOT"/*; do if [[ -d "$backup_dir" ]]; then local name name=$(basename "$backup_dir") echo "Source: $name"
local count size count=$(find "$backup_dir" -name "*.tar*" | wc -l) size=$(du -sh "$backup_dir" | cut -f1)
echo " Files: $count" echo " Total Size: $size" echo " Backups:"
find "$backup_dir" -name "*.tar*" -type f -printf " %f (%Ts)\n" | sort -r echo "" fi done}
# Verify backup integrityverify_backup() { local backup_file="$1"
log "Verifying backup: $backup_file"
if [[ ! -f "$backup_file" ]]; then error "Backup file not found: $backup_file" return 1 fi
# Check checksum local checksum_file="${backup_file}.sha256" if [[ -f "$checksum_file" ]]; then if sha256sum -c "$checksum_file" &>/dev/null; then log "Checksum verified: OK" else error "Checksum verification FAILED" return 1 fi else log "Warning: No checksum file found" fi
# Check archive integrity local ext ext="${backup_file##*.}"
case "$ext" in gz) if gzip -t "$backup_file" 2>/dev/null; then log "Archive integrity: OK" else error "Archive integrity check FAILED" return 1 fi ;; bz2) if bzip2 -t "$backup_file" 2>/dev/null; then log "Archive integrity: OK" else error "Archive integrity check FAILED" return 1 fi ;; xz) if xz -t "$backup_file" 2>/dev/null; then log "Archive integrity: OK" else error "Archive integrity check FAILED" return 1 fi ;; esac
log "Backup verification completed successfully" return 0}
# Restore backuprestore_backup() { local backup_file="$1" local restore_path="${2:-/tmp/restore}"
log "Restoring backup: $backup_file" log "Restore path: $restore_path"
if [[ ! -f "$backup_file" ]]; then error "Backup file not found: $backup_file" return 1 fi
# Verify first if ! verify_backup "$backup_file"; then error "Cannot restore - backup verification failed" return 1 fi
# Create restore directory mkdir -p "$restore_path"
# Extract local ext ext="${backup_file##*.}"
case "$ext" in gz) tar -xzf "$backup_file" -C "$restore_path" ;; bz2) tar -xjf "$backup_file" -C "$restore_path" ;; xz) tar -xJf "$backup_file" -C "$restore_path" ;; *) tar -xf "$backup_file" -C "$restore_path" ;; esac
log "Restore completed to $restore_path" return 0}
# ============================================================================# Cleanup Functions# ============================================================================
# Cleanup old backupscleanup_old_backups() { log "Starting cleanup of backups older than $RETENTION_DAYS days"
local deleted_count=0 local deleted_size=0
while IFS= read -r -d '' file; do local size size=$(du -b "$file" | cut -f1)
rm -f "$file"
# Also remove related files (checksum, manifest) rm -f "${file}.sha256" 2>/dev/null rm -f "${file}.manifest" 2>/dev/null
((deleted_count++)) ((deleted_size+=size))
log "Deleted: $file" done < <(find "$BACKUP_ROOT" -name "*.tar*" -type f -mtime +"$RETENTION_DAYS" -print0)
local deleted_size_human deleted_size_human=$((deleted_size / 1024 / 1024))
log "Cleanup completed: $deleted_count files deleted (${deleted_size_human}MB)"
send_notification "Cleanup Complete" "Deleted $deleted_count old backup(s)"}
# ============================================================================# Main Functions# ============================================================================
# Perform backupdo_backup() { local sources=("${@:-${DEFAULT_SOURCES[@]}}") local timestamp timestamp=$(date +%Y%m%d_%H%M%S)
log "Starting backup process" log "Sources: ${sources[*]}"
setup_directories
local failed=0 local success=0
for source in "${sources[@]}"; do if [[ -d "$source" ]]; then if create_full_backup "$source" "$timestamp"; then ((success++)) else ((failed++)) fi else log "Warning: Source not found or not a directory: $source" ((failed++)) fi done
# Remote backup (if configured) if [[ -n "$REMOTE_HOST" ]]; then for source in "${sources[@]}"; do if create_remote_backup "$source"; then log "Remote backup success: $source" else log "Remote backup failed: $source" fi done fi
# Cleanup old backups cleanup_old_backups
if [[ $failed -eq 0 ]]; then log "Backup completed successfully" send_notification "Backup Success" "All $success backup(s) completed successfully" else error "Backup completed with $failed failure(s)" fi}
# Main entry pointmain() { local command="${1:-backup}" shift || true
case "$command" in backup) do_backup "$@" ;; restore) if [[ $# -lt 2 ]]; then echo "Usage: $0 restore <backup_file> <restore_path>" exit 1 fi restore_backup "$1" "$2" ;; list) list_backups ;; verify) verify_backup "$1" ;; cleanup) cleanup_old_backups ;; *) echo "Usage: $0 {backup|restore|list|verify|cleanup}" echo "" echo "Commands:" echo " backup [sources...] Create backup (default sources if none specified)" echo " restore <file> <path> Restore backup" echo " list List available backups" echo " verify <file> Verify backup integrity" echo " cleanup Remove old backups" exit 1 ;; esac}
main "$@"Running the Backup
Section titled “Running the Backup”# Make executablechmod +x backup-manager.sh
# Create backupsudo ./backup-manager.sh backup /etc /home
# List backups./backup-manager.sh list
# Verify a backup./backup-manager.sh verify /backup/etc/etc.20240222.tar.gz
# Restore backupsudo ./backup-manager.sh restore /backup/etc/etc.20240222.tar.gz /tmp/restore
# Cleanup old backupssudo ./backup-manager.sh cleanupExercise 3: Log Analyzer
Section titled “Exercise 3: Log Analyzer”Requirements
Section titled “Requirements”Create a log analysis script that:
- Parses multiple log files (syslog, nginx, application logs)
- Extracts error patterns (ERROR, WARNING, FATAL, CRITICAL)
- Generates statistics (error counts, most common errors)
- Creates time-based analysis (hourly, daily distribution)
- Filters by severity level
- Outputs formatted report (text, JSON, HTML)
- Supports real-time monitoring with tail -f
- Can search with regex patterns
- Tracks patterns over time
Solution
Section titled “Solution”#!/usr/bin/env bash## log-analyzer.sh - Production log analysis tool## Features:# - Multi-format log parsing# - Pattern detection and statistics# - Time-based analysis# - Real-time monitoring# - Multiple output formats# - Regex search support## Usage:# log-analyzer.sh analyze <log_file> Analyze log file# log-analyzer.sh monitor <log_file> Real-time monitoring# log-analyzer.sh search <pattern> <log_file> Search pattern# log-analyzer.sh stats <log_file> Show statistics# log-analyzer.sh errors <log_file> Show error summary## Author: DevOps Engineer# Version: 2.0.0#
set -euo pipefail
# ============================================================================# Configuration# ============================================================================
# Log formatsreadonly LOG_FORMAT_SYSLOG="%b %d %H:%M:%S %h %s"readonly LOG_FORMAT_NGINX='%h %l %u %t "%r" %>s %b'readonly LOG_FORMAT_APACHE='"%r" %>s %b'
# Patterns to detectreadonly ERROR_PATTERNS=( "ERROR" "FATAL" "CRITICAL" "CRASH" "EXCEPTION" "FAILED")
readonly WARNING_PATTERNS=( "WARNING" "WARN" "ALERT")
# Output settingsOUTPUT_FORMAT="text"REPORT_FILE=""
# Colorsreadonly COLOR_RED='\033[0;31m'readonly COLOR_GREEN='\033[0;32m'readonly COLOR_YELLOW='\033[1;33m'readonly COLOR_BLUE='\033[0;34m'readonly COLOR_CYAN='\033[0;36m'readonly COLOR_BOLD='\033[1m'readonly COLOR_NC='\033[0m'
# ============================================================================# Utility Functions# ============================================================================
print_color() { local color="$1" shift echo -e "${color}$*${COLOR_NC}"}
log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"}
# ============================================================================# Analysis Functions# ============================================================================
# Count patterns in filecount_pattern() { local file="$1" local pattern="$2"
grep -c "$pattern" "$file" 2>/dev/null || echo "0"}
# Count all error patternscount_errors() { local file="$1" local total=0
for pattern in "${ERROR_PATTERNS[@]}"; do local count count=$(count_pattern "$file" "$pattern") ((total+=count)) done
echo "$total"}
# Count all warning patternscount_warnings() { local file="$1" local total=0
for pattern in "${WARNING_PATTERNS[@]}"; do local count count=$(count_pattern "$file" "$pattern") ((total+=count)) done
echo "$total"}
# Extract unique error messagesget_unique_errors() { local file="$1" local limit="${2:-50}"
grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | \ sed 's/.*\(ERROR\|FATAL\|CRITICAL\).*/\1/' | \ sort | uniq -c | sort -rn | head -n "$limit"}
# Get time distributionget_time_distribution() { local file="$1"
# Try different time formats # Format: [22/Feb/2024:10:30:45 if grep -qE '\[[0-9]{2}/[A-Za-z]{3}/[0-9]{4}:' "$file"; then grep -oE '\[[0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}' "$file" | \ cut -d':' -f2 | sort | uniq -c | sort -rn return fi
# Format: 2024-02-22 10:30:45 if grep -qE '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:' "$file"; then grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:' "$file" | \ cut -d' ' -f2 | cut -d':' -f1 | sort | uniq -c | sort -rn return fi
# Format: Feb 22 10:30:45 awk '{print $1, $2, $3}' "$file" | sort | uniq -c | sort -rn}
# Get top errors by frequencyget_top_errors() { local file="$1" local num="${2:-10}"
grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | \ sed -E 's/.*(ERROR|FATAL|CRITICAL|EXCEPTION|FAILED)[,:]? ?(.*)/\2/' | \ sed 's/^ *//;s/ *$//' | \ grep -v '^$' | \ sort | uniq -c | sort -rn | head -n "$num"}
# Get HTTP status code distributionget_status_distribution() { local file="$1"
grep -oE ' [0-9]{3} ' "$file" | \ sort | uniq -c | sort -rn | \ awk '{ status=$2 count=$1 if(status ~ /^2/) { category="2xx Success" } else if(status ~ /^3/) { category="3xx Redirect" } else if(status ~ /^4/) { category="4xx Client Error" } else if(status ~ /^5/) { category="5xx Server Error" } else { category="Other" } printf " %s (%s): %d\n", status, category, count }'}
# Get IP addresses (for access logs)get_top_ips() { local file="$1" local num="${2:-10}"
grep -oE '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' "$file" | \ sort | uniq -c | sort -rn | head -n "$num"}
# Get most requested URLsget_top_urls() { local file="$1" local num="${2:-10}"
grep -oE '"(GET|POST|PUT|DELETE|HEAD|OPTIONS|PATCH) [^"]+' "$file" | \ sed 's/"//g' | sort | uniq -c | sort -rn | head -n "$num"}
# ============================================================================# Analysis Output Functions# ============================================================================
# Analyze log fileanalyze_log() { local file="$1"
if [[ ! -f "$file" ]]; then log "ERROR: File not found: $file" return 1 fi
local total_lines total_lines=$(wc -l < "$file") local file_size file_size=$(du -h "$file" | cut -f1)
echo "" print_color "$COLOR_BOLD" "╔══════════════════════════════════════════════════════════╗" print_color "$COLOR_BOLD" "║ Log Analysis Report ║" print_color "$COLOR_BOLD" "╚══════════════════════════════════════════════════════════╝" echo ""
echo "File: $file" echo "Size: $file_size" echo "Total Lines: $total_lines" echo "Last Modified: $(stat -c '%y' "$file" 2>/dev/null || stat -f '%Sm' "$file")" echo ""
# Error counts print_color "$COLOR_BOLD" "=== Error Summary ===" local error_count warning_count error_count=$(count_errors "$file") warning_count=$(count_warnings "$file")
local error_pct warning_pct error_pct=$((error_count * 100 / total_lines)) warning_pct=$((warning_count * 100 / total_lines))
print_color "$COLOR_RED" " Errors: $error_count ($error_pct%)" print_color "$COLOR_YELLOW" " Warnings: $warning_count ($warning_pct%)" echo ""
# Per-pattern breakdown print_color "$COLOR_BOLD" "=== Error Breakdown ===" for pattern in "${ERROR_PATTERNS[@]}"; do local count count=$(count_pattern "$file" "$pattern") if [[ "$count" -gt 0 ]]; then print_color "$COLOR_RED" " $pattern: $count" fi done echo ""
# Top errors print_color "$COLOR_BOLD" "=== Top Error Messages ===" get_top_errors "$file" 10 | while read -r count message; do if [[ -n "$message" ]]; then echo " [$count] $message" fi done echo ""
# Time distribution print_color "$COLOR_BOLD" "=== Time Distribution (Hour) ===" get_time_distribution "$file" | head -10 | while read -r count hour; do printf " %02s:00 - %d occurrences\n" "$hour" "$count" done echo ""
# Check if it's an access log if grep -qE 'HTTP/[12]\.[01]' "$file" 2>/dev/null; then print_color "$COLOR_BOLD" "=== HTTP Status Codes ===" get_status_distribution "$file" echo ""
print_color "$COLOR_BOLD" "=== Top IPs ===" get_top_ips "$file" 10 | while read -r count ip; do printf " %-15s - %d requests\n" "$ip" "$count" done echo ""
print_color "$COLOR_BOLD" "=== Top URLs ===" get_top_urls "$file" 10 | while read -r count url; do printf " [%d] %s\n" "$count" "$url" done echo "" fi
# Recent errors print_color "$COLOR_BOLD" "=== Recent Errors (Last 10) ===" grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | tail -10 | \ sed 's/^/ /' echo ""}
# Real-time monitoringmonitor_log() { local file="$1"
if [[ ! -f "$file" ]]; then log "ERROR: File not found: $file" return 1 fi
log "Starting real-time monitoring of $file (Ctrl+C to stop)" echo ""
tail -f "$file" | while IFS= read -r line; do if [[ "$line" =~ ERROR|FATAL|CRITICAL ]]; then print_color "$COLOR_RED" "[ERROR] $line" elif [[ "$line" =~ WARNING|WARN ]]; then print_color "$COLOR_YELLOW" "[WARN] $line" else echo "$line" fi done}
# Search patternsearch_pattern() { local pattern="$1" local file="$2"
if [[ ! -f "$file" ]]; then log "ERROR: File not found: $file" return 1 fi
log "Searching for: $pattern" local count count=$(grep -c "$pattern" "$file" 2>/dev/null || echo "0") echo "Found $count matches" echo ""
grep -n "$pattern" "$file" 2>/dev/null | head -50 | \ while read -r line; do if [[ "$line" =~ ERROR|FATAL|CRITICAL ]]; then print_color "$COLOR_RED" "$line" else echo "$line" fi done}
# ============================================================================# Main Function# ============================================================================
main() { local command="${1:-analyze}" shift || true
case "$command" in analyze) if [[ $# -lt 1 ]]; then echo "Usage: $0 analyze <log_file>" exit 1 fi analyze_log "$1" ;;
monitor) if [[ $# -lt 1 ]]; then echo "Usage: $0 monitor <log_file>" exit 1 fi monitor_log "$1" ;;
search) if [[ $# -lt 2 ]]; then echo "Usage: $0 search <pattern> <log_file>" exit 1 fi search_pattern "$1" "$2" ;;
stats) if [[ $# -lt 1 ]]; then echo "Usage: $0 stats <log_file>" exit 1 fi analyze_log "$1" ;;
errors) if [[ $# -lt 1 ]]; then echo "Usage: $0 errors <log_file>" exit 1 fi grep -E "${ERROR_PATTERNS[*]}" "$1" 2>/dev/null || echo "No errors found" ;;
*) echo "Usage: $0 {analyze|monitor|search|stats|errors} [options]" echo "" echo "Commands:" echo " analyze <file> Analyze log file and show statistics" echo " monitor <file> Real-time monitoring" echo " search <pat> <f> Search for pattern in log" echo " stats <file> Show detailed statistics" echo " errors <file> Show all error lines" exit 1 ;; esac}
main "$@"Running the Analyzer
Section titled “Running the Analyzer”# Make executablechmod +x log-analyzer.sh
# Analyze system logsudo ./log-analyzer.sh analyze /var/log/syslog
# Analyze nginx access log./log-analyzer.sh analyze /var/log/nginx/access.log
# Monitor in real-timesudo ./log-analyzer.sh monitor /var/log/syslog
# Search for specific pattern./log-analyzer.sh search "connection refused" /var/log/nginx/error.log
# Show all errorssudo ./log-analyzer.sh errors /var/log/syslogSummary
Section titled “Summary”In this chapter, you completed practical exercises covering:
- ✅ System Monitoring Dashboard - Complete production monitoring
- ✅ Backup Automation Script - Enterprise backup solution
- ✅ Log Analyzer - Multi-format log analysis
These exercises provide real-world DevOps experience with production-ready code.
Congratulations!
Section titled “Congratulations!”You have completed all 46 chapters of the bash-scripting-guide! This comprehensive guide covered everything needed for DevOps, SRE, and SysAdmin positions:
- Basic to advanced bash concepts
- Text processing (sed, awk, regex)
- Process management
- Error handling
- Security best practices
- Performance optimization
- Real-world automation scripts
- Interview preparation
- Practical exercises
Previous Chapter: Interview Questions End of Guide