Skip to content

Practical_exercises

This chapter contains hands-on exercises for practicing bash scripting skills needed for DevOps, SRE, and SysAdmin positions. Each exercise includes requirements, hints, and complete solutions with production-ready code.


Create a comprehensive system monitoring script that:

  1. Monitors CPU, Memory, Disk usage
  2. Checks critical services status
  3. Monitors network connectivity
  4. Generates alerts when thresholds exceeded
  5. Outputs formatted report
  6. Can run continuously or once
  7. Supports JSON output for integration with monitoring systems
  8. Tracks historical data
  • Use top, free, df for metrics
  • Use systemctl for service checks
  • Use ping or curl for network
  • Use while loop for continuous mode
  • Use colors for terminal output
  • Use #!/usr/bin/env bash for portability
  • Use set -euo pipefail for safety
#!/usr/bin/env bash
#
# system-monitor.sh - Comprehensive system monitoring for production environments
#
# Usage: system-monitor.sh [OPTIONS]
# -c, --continuous Run continuously
# -i, --interval N Interval in seconds (default: 60)
# -j, --json Output in JSON format
# -h, --help Show this help
#
# Author: DevOps Engineer
# Version: 1.0.0
#
set -euo pipefail
# ============================================================================
# Configuration - Modify these values for your environment
# ============================================================================
# Thresholds (percent)
readonly THRESHOLD_CPU=80
readonly THRESHOLD_MEM=85
readonly THRESHOLD_DISK=90
readonly THRESHOLD_LOAD=2.0
# Directories
readonly STATE_DIR="/var/lib/system-monitor"
readonly LOG_DIR="/var/log"
# Services to monitor (add your critical services here)
readonly SERVICES=(
"nginx"
"docker"
"sshd"
"postgresql"
"redis"
)
# Network hosts to check
readonly NETWORK_HOSTS=(
"8.8.8.8"
"1.1.1.1"
)
# Email for alerts (set empty to disable)
readonly ALERT_EMAIL="ops@example.com"
# ============================================================================
# Colors and Formatting
# ============================================================================
# Terminal colors
readonly COLOR_RED='\033[0;31m'
readonly COLOR_GREEN='\033[0;32m'
readonly COLOR_YELLOW='\033[1;33m'
readonly COLOR_BLUE='\033[0;34m'
readonly COLOR_CYAN='\033[0;36m'
readonly COLOR_BOLD='\033[1m'
readonly COLOR_NC='\033[0m' # No Color
# ============================================================================
# Global Variables
# ============================================================================
OUTPUT_JSON=false
CONTINUOUS_MODE=false
INTERVAL=60
# ============================================================================
# Utility Functions
# ============================================================================
# Print colored output
print_color() {
local color="$1"
local message="$2"
echo -e "${color}${message}${COLOR_NC}"
}
# Log messages
log() {
local level="$1"
shift
local message="[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*"
echo "$message" | tee -a "${LOG_DIR}/system-monitor.log"
}
# Send alert email
send_alert() {
local subject="$1"
local body="$2"
if [[ -n "$ALERT_EMAIL" ]] && command -v mail &>/dev/null; then
echo "$body" | mail -s "[ALERT] $subject" "$ALERT_EMAIL"
fi
}
# Check if running as root
check_root() {
if [[ $EUID -ne 0 ]]; then
log "WARN" "Not running as root. Some checks may fail."
fi
}
# Initialize directories
init_directories() {
mkdir -p "$STATE_DIR" "$LOG_DIR"
chmod 755 "$STATE_DIR"
}
# ============================================================================
# Metric Collection Functions
# ============================================================================
# Get CPU usage percentage
get_cpu_usage() {
# Method 1: Using top (most compatible)
local cpu_idle
cpu_idle=$(top -bn2 -d 0.5 | grep "Cpu(s)" | tail -1 | awk '{print $8}' | cut -d'%' -f1)
if [[ -n "$cpu_idle" ]]; then
echo "100 - $cpu_idle" | bc -l
return
fi
# Method 2: Using /proc/stat (Linux specific)
if [[ -f /proc/stat ]]; then
local cpu_line
cpu_line=$(head -1 /proc/stat)
local user nice system idle iowait irq softirq
read -r user nice system idle iowait irq softirq <<< "$cpu_line"
local total idle_total
total=$((user + nice + system + idle + iowait + irq + softirq))
idle_total=$((idle + iowait))
echo "scale=2; ($total - $idle_total) * 100 / $total" | bc
return
fi
echo "0"
}
# Get memory usage percentage
get_memory_usage() {
local mem_total mem_used mem_available
if command -v free &>/dev/null; then
mem_total=$(free -m | awk 'NR==2{print $2}')
mem_available=$(free -m | awk 'NR==2{print $7}')
mem_used=$((mem_total - mem_available))
echo "scale=2; $mem_used * 100 / $mem_total" else
echo "0"
| bc
fi
}
# Get detailed memory info
get_memory_info() {
free -h | awk 'NR==1{next} {print $1 ": " $2 " used / " $3 " free"}'
}
# Get disk usage for a path
get_disk_usage() {
local path="${1:-/}"
df -h "$path" | tail -1 | awk '{print $5}' | tr -d '%'
}
# Get disk info for all mounted filesystems
get_disk_info() {
df -h | grep -E '^/dev/' | awk '{printf "%s: %s used of %s (%s)\n", $1, $3, $2, $5}'
}
# Get load average
get_load_average() {
uptime | awk -F'load average:' '{print $2}' | awk '{print $1}' | sed 's/,//'
}
# Get number of CPU cores
get_cpu_cores() {
nproc 2>/dev/null || grep -c ^processor /proc/cpuinfo
}
# ============================================================================
# Service Checking Functions
# ============================================================================
# Check if a service is running
check_service() {
local service="$1"
if ! command -v systemctl &>/dev/null; then
# Fallback for systems without systemd
if pgrep -x "$service" &>/dev/null; then
echo "running"
return 0
else
echo "stopped"
return 1
fi
fi
if systemctl is-active --quiet "$service" 2>/dev/null; then
echo "running"
return 0
else
local failed_reason
failed_reason=$(systemctl status "$service" 2>&1 | grep -i "failed" | head -1)
echo "stopped: ${failed_reason:-unknown reason}"
return 1
fi
}
# Check all configured services
check_all_services() {
local failed=0
local services_json="["
echo ""
print_color "$COLOR_BOLD" "=== Service Status ==="
echo ""
for service in "${SERVICES[@]}"; do
local status
status=$(check_service "$service" 2>&1)
if [[ "$status" == "running" ]]; then
print_color "$COLOR_GREEN" " ✓ $service: running"
services_json+="{\"name\":\"$service\",\"status\":\"running\"},"
else
print_color "$COLOR_RED" " ✗ $service: $status"
services_json+="{\"name\":\"$service\",\"status\":\"$status\"},"
((failed++))
fi
done
services_json="${services_json%,}]"
if [[ $failed -gt 0 ]]; then
send_alert "Service Down" "$failed service(s) are not running"
fi
echo "$services_json"
}
# ============================================================================
# Network Checking Functions
# ============================================================================
# Check network connectivity
check_network() {
local failed=0
local results_json="["
echo ""
print_color "$COLOR_BOLD" "=== Network Connectivity ==="
echo ""
for host in "${NETWORK_HOSTS[@]}"; do
local result
if ping -c 1 -W 2 "$host" &>/dev/null; then
print_color "$COLOR_GREEN" " ✓ $host: reachable"
results_json+="{\"host\":\"$host\",\"status\":\"ok\"},"
else
print_color "$COLOR_RED" " ✗ $host: unreachable"
results_json+="{\"host\":\"$host\",\"status\":\"unreachable\"},"
((failed++))
fi
done
# Check DNS
local dns_result
if host google.com &>/dev/null; then
print_color "$COLOR_GREEN" " ✓ DNS: working"
results_json+="{\"host\":\"dns\",\"status\":\"ok\"},"
else
print_color "$COLOR_RED" " ✗ DNS: not working"
results_json+="{\"host\":\"dns\",\"status\":\"failed\"},"
((failed++))
fi
results_json="${results_json%,}]"
if [[ $failed -gt 0 ]]; then
send_alert "Network Issue" "$failed network check(s) failed"
fi
echo "$results_json"
}
# ============================================================================
# Process Information Functions
# ============================================================================
# Get top processes by CPU
get_top_cpu_processes() {
echo ""
print_color "$COLOR_BOLD" "=== Top 5 Processes by CPU ==="
echo ""
ps aux --sort=-%cpu | head -6 | tail -5 | \
awk '{printf " %-20s %5s%% CPU\n", $11, $3}'
}
# Get top processes by memory
get_top_mem_processes() {
echo ""
print_color "$COLOR_BOLD" "=== Top 5 Processes by Memory ==="
echo ""
ps aux --sort=-%mem | head -6 | tail -5 | \
awk '{printf " %-20s %5s%% MEM\n", $11, $4}'
}
# Get process count
get_process_count() {
local total running sleeping stopped zombie
total=$(ps aux | wc -l)
running=$(ps -eo state= | grep -c R)
sleeping=$(ps -eo state= | grep -c S)
stopped=$(ps -eo state= | grep -c T)
zombie=$(ps -eo state= | grep -c Z)
echo "Total: $total | Running: $running | Sleeping: $sleeping | Stopped: $stopped | Zombie: $zombie"
}
# ============================================================================
# System Information Functions
# ============================================================================
# Get system uptime
get_uptime() {
uptime -p 2>/dev/null || uptime | awk '{print $3,$4}' | tr -d ','
}
# Get logged in users
get_logged_users() {
who | awk '{print $1, $2, $3}' | sort -u
}
# ============================================================================
# Alert Functions
# ============================================================================
# Check thresholds and alert
check_thresholds() {
local cpu_usage="$1"
local mem_usage="$2"
local disk_usage="$3"
local load_avg="$4"
local cores="$5"
local alerts=()
# CPU check
if (( $(echo "$cpu_usage > $THRESHOLD_CPU" | bc -l) )); then
alerts+=("CPU usage is ${cpu_usage}% (threshold: ${THRESHOLD_CPU}%)")
fi
# Memory check
if (( $(echo "$mem_usage > $THRESHOLD_MEM" | bc -l) )); then
alerts+=("Memory usage is ${mem_usage}% (threshold: ${THRESHOLD_MEM}%)")
fi
# Disk check
if [[ "$disk_usage" -gt "$THRESHOLD_DISK" ]]; then
alerts+=("Disk usage is ${disk_usage}% (threshold: ${THRESHOLD_DISK}%)")
fi
# Load check (load > cores * threshold)
local max_load
max_load=$(echo "$cores * $THRESHOLD_LOAD" | bc)
if (( $(echo "$load_avg > $max_load" | bc -l) )); then
alerts+=("Load average is ${load_avg} (threshold: ${max_load} for ${cores} cores)")
fi
# Send alert if any threshold exceeded
if [[ ${#alerts[@]} -gt 0 ]]; then
local alert_message
alert_message=$(printf "%s\n" "${alerts[@]}")
send_alert "System Threshold Exceeded" "$alert_message"
fi
# Return alerts as JSON array
local alerts_json="["
for alert in "${alerts[@]}"; do
alerts_json+="\"$alert\","
done
alerts_json="${alerts_json%,}]"
echo "$alerts_json"
}
# ============================================================================
# Output Functions
# ============================================================================
# Generate JSON output
generate_json_output() {
local cpu_usage="$1"
local mem_usage="$2"
local disk_usage="$3"
local load_avg="$4"
local cores="$5"
local uptime="$6"
local alerts="$7"
local services="$8"
local network="$9"
cat <<EOF
{
"timestamp": "$(date -Iseconds)",
"hostname": "$(hostname)",
"metrics": {
"cpu": {
"usage_percent": $cpu_usage,
"cores": $cores
},
"memory": {
"usage_percent": $mem_usage
},
"disk": {
"root_usage_percent": $disk_usage
},
"load_average": $load_avg,
"uptime": "$uptime"
},
"alerts": $alerts,
"services": $services,
"network": $network
}
EOF
}
# Print human-readable output
print_human_output() {
local cpu_usage="$1"
local mem_usage="$2"
local disk_usage="$3"
local load_avg="$4"
local cores="$5"
local uptime="$6"
echo ""
print_color "$COLOR_BOLD" "╔══════════════════════════════════════════════════════════╗"
print_color "$COLOR_BOLD" "║ System Monitor - $(date '+%Y-%m-%d %H:%M:%S') ║"
print_color "$COLOR_BOLD" "╚══════════════════════════════════════════════════════════╝"
echo ""
print_color "$COLOR_BOLD" "=== System Information ==="
echo " Hostname: $(hostname)"
echo " Uptime: $uptime"
echo " CPU Cores: $cores"
echo ""
print_color "$COLOR_BOLD" "=== Resource Usage ==="
# CPU
local cpu_color="$COLOR_GREEN"
if (( $(echo "$cpu_usage > $THRESHOLD_CPU" | bc -l) )); then
cpu_color="$COLOR_RED"
elif (( $(echo "$cpu_usage > $((THRESHOLD_CPU / 2))" | bc -l) )); then
cpu_color="$COLOR_YELLOW"
fi
printf " CPU Usage: ${cpu_color}%6.1f%%${COLOR_NC} (threshold: %d%%)\n" "$cpu_usage" "$THRESHOLD_CPU"
# Memory
local mem_color="$COLOR_GREEN"
if (( $(echo "$mem_usage > $THRESHOLD_MEM" | bc -l) )); then
mem_color="$COLOR_RED"
elif (( $(echo "$mem_usage > $((THRESHOLD_MEM / 2))" | bc -l) )); then
mem_color="$COLOR_YELLOW"
fi
printf " Memory Usage: ${mem_color}%6.1f%%${COLOR_NC} (threshold: %d%%)\n" "$mem_usage" "$THRESHOLD_MEM"
# Disk
local disk_color="$COLOR_GREEN"
if [[ "$disk_usage" -gt "$THRESHOLD_DISK" ]]; then
disk_color="$COLOR_RED"
elif [[ "$disk_usage" -gt $((THRESHOLD_DISK / 2)) ]]; then
disk_color="$COLOR_YELLOW"
fi
printf " Disk Usage: ${disk_color}%6s${COLOR_NC} (threshold: %d%%)\n" "${disk_usage}%" "$THRESHOLD_DISK"
# Load
local max_load
max_load=$(echo "$cores * $THRESHOLD_LOAD" | bc)
local load_color="$COLOR_GREEN"
if (( $(echo "$load_avg > $max_load" | bc -l) )); then
load_color="$COLOR_RED"
fi
printf " Load Average: ${load_color}%6s${COLOR_NC} (threshold: %.1f for %d cores)\n" "$load_avg" "$max_load" "$cores"
echo ""
echo " Memory Details:"
get_memory_info | while read -r line; do
echo " $line"
done
echo ""
echo " Disk Details:"
get_disk_info | while read -r line; do
echo " $line"
done
get_top_cpu_processes
get_top_mem_processes
echo ""
print_color "$COLOR_BOLD" "=== Process Information ==="
echo " $(get_process_count)"
echo ""
print_color "$COLOR_BOLD" "=== Logged In Users ==="
get_logged_users | while read -r user; do
echo " $user"
done
echo ""
}
# ============================================================================
# Main Functions
# ============================================================================
# Run all checks
run_checks() {
# Collect metrics
local cpu_usage
cpu_usage=$(get_cpu_usage)
local mem_usage
mem_usage=$(get_memory_usage)
local disk_usage
disk_usage=$(get_disk_usage)
local load_avg
load_avg=$(get_load_average)
local cores
cores=$(get_cpu_cores)
local uptime
uptime=$(get_uptime)
# Run checks
local services_json
services_json=$(check_all_services)
local network_json
network_json=$(check_network)
# Check thresholds and get alerts
local alerts_json
alerts_json=$(check_thresholds "$cpu_usage" "$mem_usage" "$disk_usage" "$load_avg" "$cores")
# Output
if [[ "$OUTPUT_JSON" == "true" ]]; then
generate_json_output "$cpu_usage" "$mem_usage" "$disk_usage" \
"$load_avg" "$cores" "$uptime" "$alerts_json" "$services_json" "$network_json"
else
print_human_output "$cpu_usage" "$mem_usage" "$disk_usage" \
"$load_avg" "$cores" "$uptime"
fi
}
# Parse command line arguments
parse_arguments() {
while [[ $# -gt 0 ]]; do
case $1 in
-c|--continuous)
CONTINUOUS_MODE=true
shift
;;
-i|--interval)
INTERVAL="$2"
shift 2
;;
-j|--json)
OUTPUT_JSON=true
shift
;;
-h|--help)
echo "Usage: $0 [OPTIONS]"
echo ""
echo "Options:"
echo " -c, --continuous Run continuously"
echo " -i, --interval N Interval in seconds (default: 60)"
echo " -j, --json Output in JSON format"
echo " -h, --help Show this help"
exit 0
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
}
# ============================================================================
# Main Entry Point
# ============================================================================
main() {
parse_arguments "$@"
check_root
init_directories
log "INFO" "Starting system monitor"
if [[ "$CONTINUOUS_MODE" == "true" ]]; then
log "INFO" "Running in continuous mode (interval: ${INTERVAL}s)"
while true; do
run_checks
sleep "$INTERVAL"
done
else
run_checks
fi
log "INFO" "System monitor completed"
}
main "$@"
Terminal window
# Make executable
chmod +x system-monitor.sh
# Run once (human-readable output)
./system-monitor.sh
# Run once (JSON output)
./system-monitor.sh --json
# Run continuously (every 60 seconds)
./system-monitor.sh --continuous
# Run continuously with custom interval
./system-monitor.sh --continuous --interval 30

Create a backup script that:

  1. Backs up specified directories
  2. Creates timestamped archives
  3. Implements incremental backups using snapshots
  4. Verifies backup integrity with checksums
  5. Cleans up old backups based on retention policy
  6. Sends notifications on completion/failure
  7. Supports both local and remote backup (via SSH/rsync)
  8. Provides detailed logging
  • Use tar for archiving
  • Use rsync for incremental
  • Use md5sum or sha256sum for verification
  • Use ssh with key-based authentication for remote backup
  • Use find with -mtime for cleanup
  • Use logger for system logging
#!/usr/bin/env bash
#
# backup-manager.sh - Enterprise backup automation solution
#
# Features:
# - Full and incremental backups
# - Local and remote backup support
# - Integrity verification
# - Automatic cleanup
# - Email notifications
# - Comprehensive logging
#
# Usage:
# backup-manager.sh backup [source...] Create backup
# backup-manager.sh restore <backup> <path> Restore backup
# backup-manager.sh list List available backups
# backup-manager.sh verify <backup> Verify backup integrity
# backup-manager.sh cleanup Remove old backups
#
# Author: DevOps Engineer
# Version: 2.0.0
#
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
# Directories
readonly BACKUP_ROOT="/backup"
readonly STATE_DIR="${BACKUP_ROOT}/.state"
readonly LOG_DIR="/var/log"
# Backup settings
readonly COMPRESSION="gzip" # gzip, bzip2, xz, or none
readonly BACKUP_USER="backup"
readonly RETENTION_DAYS=30
readonly INCREMENTAL_RETENTION_DAYS=7
# Remote backup settings (optional)
readonly REMOTE_HOST=""
readonly REMOTE_USER="backup"
readonly REMOTE_PATH="/backup"
# Email settings
readonly ALERT_EMAIL="ops@example.com"
readonly BACKUP_EMAIL="backup-reports@example.com"
# Default sources if not provided
DEFAULT_SOURCES=(
"/etc"
"/home"
"/var/www"
"/opt"
)
# ============================================================================
# Global Variables
# ============================================================================
DRY_RUN=false
VERBOSE=false
PARALLEL_JOBS=4
# ============================================================================
# Utility Functions
# ============================================================================
log() {
local level="$1"
shift
local message="[$(date '+%Y-%m-%d %H:%M:%S')] [$level] $*"
echo "$message" | tee -a "${LOG_DIR}/backup.log"
}
log_verbose() {
if [[ "$VERBOSE" == "true" ]]; then
log "DEBUG" "$*"
fi
}
error() {
log "ERROR" "$*"
send_notification "ERROR" "$*"
}
notify() {
local subject="$1"
local body="$2"
if [[ -n "$ALERT_EMAIL" ]] && command -v mailx &>/dev/null; then
echo "$body" | mailx -s "[BACKUP] $subject" "$ALERT_EMAIL"
fi
}
send_notification() {
local status="$1"
local message="$2"
notify "$status" "$message"
}
# Check if command exists
require_command() {
local cmd="$1"
if ! command -v "$cmd" &>/dev/null; then
error "Required command not found: $cmd"
exit 1
fi
}
# ============================================================================
# Setup Functions
# ============================================================================
setup_directories() {
log "Setting up backup directories..."
mkdir -p "$BACKUP_ROOT"
mkdir -p "$STATE_DIR"
mkdir -p "$LOG_DIR"
# Set permissions
chmod 700 "$BACKUP_ROOT"
chmod 700 "$STATE_DIR"
log "Backup directories created at $BACKUP_ROOT"
}
# ============================================================================
# Backup Functions
# ============================================================================
# Get compression options
get_compression_args() {
case "$COMPRESSION" in
gzip)
echo "-z"
;;
bzip2)
echo "-j"
;;
xz)
echo "-J"
;;
none)
echo ""
;;
*)
echo "-z"
;;
esac
}
# Get compression extension
get_compression_ext() {
case "$COMPRESSION" in
gzip) echo ".gz" ;;
bzip2) echo ".bz2" ;;
xz) echo ".xz" ;;
none) echo "" ;;
*) echo ".gz" ;;
esac
}
# Get file extension
get_backup_ext() {
local ext
ext=$(get_compression_ext)
echo ".tar${ext}"
}
# Create incremental backup
create_incremental_backup() {
local source="$1"
local timestamp="$2"
local backup_type="$3"
local source_name
source_name=$(basename "$source")
local backup_dir="${BACKUP_ROOT}/${source_name}"
local snapshot_file="${STATE_DIR}/${source_name}.snapshot"
mkdir -p "$backup_dir"
local backup_file="${backup_dir}/${source_name}.${timestamp}"
log "Creating $backup_type backup of $source"
local tar_args
tar_args=$(get_compression_args)
if [[ "$backup_type" == "full" ]]; then
# Full backup - create new snapshot
tar --listed-incremental=/dev/null $tar_args -cf "${backup_file}.full.tar.gz" -C "$(dirname "$source")" "$(basename "$source")" 2>/dev/null || {
error "Failed to create full backup of $source"
return 1
}
# Create initial snapshot
find "$source" -type f -printf "%p\n" 2>/dev/null | sort > "${snapshot_file}.new"
else
# Incremental backup
if [[ ! -f "$snapshot_file" ]]; then
log "No snapshot found, creating full backup instead"
create_incremental_backup "$source" "$timestamp" "full"
return $?
fi
tar --listed-incremental="$snapshot_file" $tar_args -cf "${backup_file}.incr.tar.gz" -C "$(dirname "$source")" "$(basename "$source")" 2>/dev/null || {
error "Failed to create incremental backup of $source"
return 1
}
fi
# Calculate checksum
local checksum_file="${backup_file}.sha256"
sha256sum "${backup_file}.tar.gz" > "$checksum_file"
# Verify backup
if tar -tzf "${backup_file}.tar.gz" &>/dev/null; then
local size
size=$(du -h "${backup_file}.tar.gz" | cut -f1)
log "Backup created successfully: ${backup_file}.tar.gz ($size)"
return 0
else
error "Backup verification failed"
rm -f "${backup_file}.tar.gz" "$checksum_file"
return 1
fi
}
# Create full backup (simpler version)
create_full_backup() {
local source="$1"
local timestamp="$2"
if [[ ! -d "$source" ]]; then
error "Source directory does not exist: $source"
return 1
fi
local source_name
source_name=$(basename "$source")
local backup_dir="${BACKUP_ROOT}/${source_name}"
local backup_file="${backup_dir}/${source_name}.${timestamp}"
mkdir -p "$backup_dir"
log "Creating full backup of $source"
# Get compression args
local tar_args
tar_args=$(get_compression_args)
local ext
ext=$(get_compression_ext)
# Create archive
tar $tar_args -cf "${backup_file}${ext}" -C "$(dirname "$source")" "$(basename "$source")" || {
error "Failed to create backup archive"
return 1
}
# Calculate checksum
local checksum_file="${backup_file}${ext}.sha256"
sha256sum "${backup_file}${ext}" > "$checksum_file"
# Create manifest
local manifest="${backup_file}${ext}.manifest"
{
echo "Backup: ${backup_file}${ext}"
echo "Source: $source"
echo "Date: $(date)"
echo "Hostname: $(hostname)"
echo ""
echo "Files:"
tar $tar_args -tf "${backup_file}${ext}" | wc -l
} > "$manifest"
# Verify backup
if tar $tar_args -tf "${backup_file}${ext}" &>/dev/null; then
local size
size=$(du -h "${backup_file}${ext}" | cut -f1)
log "Backup created successfully: ${backup_file}${ext} ($size)"
return 0
else
error "Backup verification failed"
rm -f "${backup_file}${ext}" "$checksum_file" "$manifest"
return 1
fi
}
# Backup to remote server
create_remote_backup() {
local source="$1"
if [[ -z "$REMOTE_HOST" ]]; then
error "Remote host not configured"
return 1
fi
require_command "rsync"
log "Creating remote backup of $source to ${REMOTE_HOST}:${REMOTE_PATH}"
local source_name
source_name=$(basename "$source")
rsync -avz --progress \
-e "ssh -o StrictHostKeyChecking=yes" \
--delete \
"$source/" "${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_PATH}/${source_name}/" || {
error "Remote backup failed"
return 1
}
log "Remote backup completed"
return 0
}
# ============================================================================
# Restore Functions
# ============================================================================
# List available backups
list_backups() {
echo "=== Available Backups ==="
echo ""
if [[ ! -d "$BACKUP_ROOT" ]]; then
echo "No backups found"
return
fi
echo "Backup Directory: $BACKUP_ROOT"
echo ""
for backup_dir in "$BACKUP_ROOT"/*; do
if [[ -d "$backup_dir" ]]; then
local name
name=$(basename "$backup_dir")
echo "Source: $name"
local count size
count=$(find "$backup_dir" -name "*.tar*" | wc -l)
size=$(du -sh "$backup_dir" | cut -f1)
echo " Files: $count"
echo " Total Size: $size"
echo " Backups:"
find "$backup_dir" -name "*.tar*" -type f -printf " %f (%Ts)\n" | sort -r
echo ""
fi
done
}
# Verify backup integrity
verify_backup() {
local backup_file="$1"
log "Verifying backup: $backup_file"
if [[ ! -f "$backup_file" ]]; then
error "Backup file not found: $backup_file"
return 1
fi
# Check checksum
local checksum_file="${backup_file}.sha256"
if [[ -f "$checksum_file" ]]; then
if sha256sum -c "$checksum_file" &>/dev/null; then
log "Checksum verified: OK"
else
error "Checksum verification FAILED"
return 1
fi
else
log "Warning: No checksum file found"
fi
# Check archive integrity
local ext
ext="${backup_file##*.}"
case "$ext" in
gz)
if gzip -t "$backup_file" 2>/dev/null; then
log "Archive integrity: OK"
else
error "Archive integrity check FAILED"
return 1
fi
;;
bz2)
if bzip2 -t "$backup_file" 2>/dev/null; then
log "Archive integrity: OK"
else
error "Archive integrity check FAILED"
return 1
fi
;;
xz)
if xz -t "$backup_file" 2>/dev/null; then
log "Archive integrity: OK"
else
error "Archive integrity check FAILED"
return 1
fi
;;
esac
log "Backup verification completed successfully"
return 0
}
# Restore backup
restore_backup() {
local backup_file="$1"
local restore_path="${2:-/tmp/restore}"
log "Restoring backup: $backup_file"
log "Restore path: $restore_path"
if [[ ! -f "$backup_file" ]]; then
error "Backup file not found: $backup_file"
return 1
fi
# Verify first
if ! verify_backup "$backup_file"; then
error "Cannot restore - backup verification failed"
return 1
fi
# Create restore directory
mkdir -p "$restore_path"
# Extract
local ext
ext="${backup_file##*.}"
case "$ext" in
gz)
tar -xzf "$backup_file" -C "$restore_path"
;;
bz2)
tar -xjf "$backup_file" -C "$restore_path"
;;
xz)
tar -xJf "$backup_file" -C "$restore_path"
;;
*)
tar -xf "$backup_file" -C "$restore_path"
;;
esac
log "Restore completed to $restore_path"
return 0
}
# ============================================================================
# Cleanup Functions
# ============================================================================
# Cleanup old backups
cleanup_old_backups() {
log "Starting cleanup of backups older than $RETENTION_DAYS days"
local deleted_count=0
local deleted_size=0
while IFS= read -r -d '' file; do
local size
size=$(du -b "$file" | cut -f1)
rm -f "$file"
# Also remove related files (checksum, manifest)
rm -f "${file}.sha256" 2>/dev/null
rm -f "${file}.manifest" 2>/dev/null
((deleted_count++))
((deleted_size+=size))
log "Deleted: $file"
done < <(find "$BACKUP_ROOT" -name "*.tar*" -type f -mtime +"$RETENTION_DAYS" -print0)
local deleted_size_human
deleted_size_human=$((deleted_size / 1024 / 1024))
log "Cleanup completed: $deleted_count files deleted (${deleted_size_human}MB)"
send_notification "Cleanup Complete" "Deleted $deleted_count old backup(s)"
}
# ============================================================================
# Main Functions
# ============================================================================
# Perform backup
do_backup() {
local sources=("${@:-${DEFAULT_SOURCES[@]}}")
local timestamp
timestamp=$(date +%Y%m%d_%H%M%S)
log "Starting backup process"
log "Sources: ${sources[*]}"
setup_directories
local failed=0
local success=0
for source in "${sources[@]}"; do
if [[ -d "$source" ]]; then
if create_full_backup "$source" "$timestamp"; then
((success++))
else
((failed++))
fi
else
log "Warning: Source not found or not a directory: $source"
((failed++))
fi
done
# Remote backup (if configured)
if [[ -n "$REMOTE_HOST" ]]; then
for source in "${sources[@]}"; do
if create_remote_backup "$source"; then
log "Remote backup success: $source"
else
log "Remote backup failed: $source"
fi
done
fi
# Cleanup old backups
cleanup_old_backups
if [[ $failed -eq 0 ]]; then
log "Backup completed successfully"
send_notification "Backup Success" "All $success backup(s) completed successfully"
else
error "Backup completed with $failed failure(s)"
fi
}
# Main entry point
main() {
local command="${1:-backup}"
shift || true
case "$command" in
backup)
do_backup "$@"
;;
restore)
if [[ $# -lt 2 ]]; then
echo "Usage: $0 restore <backup_file> <restore_path>"
exit 1
fi
restore_backup "$1" "$2"
;;
list)
list_backups
;;
verify)
verify_backup "$1"
;;
cleanup)
cleanup_old_backups
;;
*)
echo "Usage: $0 {backup|restore|list|verify|cleanup}"
echo ""
echo "Commands:"
echo " backup [sources...] Create backup (default sources if none specified)"
echo " restore <file> <path> Restore backup"
echo " list List available backups"
echo " verify <file> Verify backup integrity"
echo " cleanup Remove old backups"
exit 1
;;
esac
}
main "$@"
Terminal window
# Make executable
chmod +x backup-manager.sh
# Create backup
sudo ./backup-manager.sh backup /etc /home
# List backups
./backup-manager.sh list
# Verify a backup
./backup-manager.sh verify /backup/etc/etc.20240222.tar.gz
# Restore backup
sudo ./backup-manager.sh restore /backup/etc/etc.20240222.tar.gz /tmp/restore
# Cleanup old backups
sudo ./backup-manager.sh cleanup

Create a log analysis script that:

  1. Parses multiple log files (syslog, nginx, application logs)
  2. Extracts error patterns (ERROR, WARNING, FATAL, CRITICAL)
  3. Generates statistics (error counts, most common errors)
  4. Creates time-based analysis (hourly, daily distribution)
  5. Filters by severity level
  6. Outputs formatted report (text, JSON, HTML)
  7. Supports real-time monitoring with tail -f
  8. Can search with regex patterns
  9. Tracks patterns over time
#!/usr/bin/env bash
#
# log-analyzer.sh - Production log analysis tool
#
# Features:
# - Multi-format log parsing
# - Pattern detection and statistics
# - Time-based analysis
# - Real-time monitoring
# - Multiple output formats
# - Regex search support
#
# Usage:
# log-analyzer.sh analyze <log_file> Analyze log file
# log-analyzer.sh monitor <log_file> Real-time monitoring
# log-analyzer.sh search <pattern> <log_file> Search pattern
# log-analyzer.sh stats <log_file> Show statistics
# log-analyzer.sh errors <log_file> Show error summary
#
# Author: DevOps Engineer
# Version: 2.0.0
#
set -euo pipefail
# ============================================================================
# Configuration
# ============================================================================
# Log formats
readonly LOG_FORMAT_SYSLOG="%b %d %H:%M:%S %h %s"
readonly LOG_FORMAT_NGINX='%h %l %u %t "%r" %>s %b'
readonly LOG_FORMAT_APACHE='"%r" %>s %b'
# Patterns to detect
readonly ERROR_PATTERNS=(
"ERROR"
"FATAL"
"CRITICAL"
"CRASH"
"EXCEPTION"
"FAILED"
)
readonly WARNING_PATTERNS=(
"WARNING"
"WARN"
"ALERT"
)
# Output settings
OUTPUT_FORMAT="text"
REPORT_FILE=""
# Colors
readonly COLOR_RED='\033[0;31m'
readonly COLOR_GREEN='\033[0;32m'
readonly COLOR_YELLOW='\033[1;33m'
readonly COLOR_BLUE='\033[0;34m'
readonly COLOR_CYAN='\033[0;36m'
readonly COLOR_BOLD='\033[1m'
readonly COLOR_NC='\033[0m'
# ============================================================================
# Utility Functions
# ============================================================================
print_color() {
local color="$1"
shift
echo -e "${color}$*${COLOR_NC}"
}
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
}
# ============================================================================
# Analysis Functions
# ============================================================================
# Count patterns in file
count_pattern() {
local file="$1"
local pattern="$2"
grep -c "$pattern" "$file" 2>/dev/null || echo "0"
}
# Count all error patterns
count_errors() {
local file="$1"
local total=0
for pattern in "${ERROR_PATTERNS[@]}"; do
local count
count=$(count_pattern "$file" "$pattern")
((total+=count))
done
echo "$total"
}
# Count all warning patterns
count_warnings() {
local file="$1"
local total=0
for pattern in "${WARNING_PATTERNS[@]}"; do
local count
count=$(count_pattern "$file" "$pattern")
((total+=count))
done
echo "$total"
}
# Extract unique error messages
get_unique_errors() {
local file="$1"
local limit="${2:-50}"
grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | \
sed 's/.*\(ERROR\|FATAL\|CRITICAL\).*/\1/' | \
sort | uniq -c | sort -rn | head -n "$limit"
}
# Get time distribution
get_time_distribution() {
local file="$1"
# Try different time formats
# Format: [22/Feb/2024:10:30:45
if grep -qE '\[[0-9]{2}/[A-Za-z]{3}/[0-9]{4}:' "$file"; then
grep -oE '\[[0-9]{2}/[A-Za-z]{3}/[0-9]{4}:[0-9]{2}' "$file" | \
cut -d':' -f2 | sort | uniq -c | sort -rn
return
fi
# Format: 2024-02-22 10:30:45
if grep -qE '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:' "$file"; then
grep -oE '[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:' "$file" | \
cut -d' ' -f2 | cut -d':' -f1 | sort | uniq -c | sort -rn
return
fi
# Format: Feb 22 10:30:45
awk '{print $1, $2, $3}' "$file" | sort | uniq -c | sort -rn
}
# Get top errors by frequency
get_top_errors() {
local file="$1"
local num="${2:-10}"
grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | \
sed -E 's/.*(ERROR|FATAL|CRITICAL|EXCEPTION|FAILED)[,:]? ?(.*)/\2/' | \
sed 's/^ *//;s/ *$//' | \
grep -v '^$' | \
sort | uniq -c | sort -rn | head -n "$num"
}
# Get HTTP status code distribution
get_status_distribution() {
local file="$1"
grep -oE ' [0-9]{3} ' "$file" | \
sort | uniq -c | sort -rn | \
awk '{
status=$2
count=$1
if(status ~ /^2/) { category="2xx Success" }
else if(status ~ /^3/) { category="3xx Redirect" }
else if(status ~ /^4/) { category="4xx Client Error" }
else if(status ~ /^5/) { category="5xx Server Error" }
else { category="Other" }
printf " %s (%s): %d\n", status, category, count
}'
}
# Get IP addresses (for access logs)
get_top_ips() {
local file="$1"
local num="${2:-10}"
grep -oE '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' "$file" | \
sort | uniq -c | sort -rn | head -n "$num"
}
# Get most requested URLs
get_top_urls() {
local file="$1"
local num="${2:-10}"
grep -oE '"(GET|POST|PUT|DELETE|HEAD|OPTIONS|PATCH) [^"]+' "$file" | \
sed 's/"//g' | sort | uniq -c | sort -rn | head -n "$num"
}
# ============================================================================
# Analysis Output Functions
# ============================================================================
# Analyze log file
analyze_log() {
local file="$1"
if [[ ! -f "$file" ]]; then
log "ERROR: File not found: $file"
return 1
fi
local total_lines
total_lines=$(wc -l < "$file")
local file_size
file_size=$(du -h "$file" | cut -f1)
echo ""
print_color "$COLOR_BOLD" "╔══════════════════════════════════════════════════════════╗"
print_color "$COLOR_BOLD" "║ Log Analysis Report ║"
print_color "$COLOR_BOLD" "╚══════════════════════════════════════════════════════════╝"
echo ""
echo "File: $file"
echo "Size: $file_size"
echo "Total Lines: $total_lines"
echo "Last Modified: $(stat -c '%y' "$file" 2>/dev/null || stat -f '%Sm' "$file")"
echo ""
# Error counts
print_color "$COLOR_BOLD" "=== Error Summary ==="
local error_count warning_count
error_count=$(count_errors "$file")
warning_count=$(count_warnings "$file")
local error_pct warning_pct
error_pct=$((error_count * 100 / total_lines))
warning_pct=$((warning_count * 100 / total_lines))
print_color "$COLOR_RED" " Errors: $error_count ($error_pct%)"
print_color "$COLOR_YELLOW" " Warnings: $warning_count ($warning_pct%)"
echo ""
# Per-pattern breakdown
print_color "$COLOR_BOLD" "=== Error Breakdown ==="
for pattern in "${ERROR_PATTERNS[@]}"; do
local count
count=$(count_pattern "$file" "$pattern")
if [[ "$count" -gt 0 ]]; then
print_color "$COLOR_RED" " $pattern: $count"
fi
done
echo ""
# Top errors
print_color "$COLOR_BOLD" "=== Top Error Messages ==="
get_top_errors "$file" 10 | while read -r count message; do
if [[ -n "$message" ]]; then
echo " [$count] $message"
fi
done
echo ""
# Time distribution
print_color "$COLOR_BOLD" "=== Time Distribution (Hour) ==="
get_time_distribution "$file" | head -10 | while read -r count hour; do
printf " %02s:00 - %d occurrences\n" "$hour" "$count"
done
echo ""
# Check if it's an access log
if grep -qE 'HTTP/[12]\.[01]' "$file" 2>/dev/null; then
print_color "$COLOR_BOLD" "=== HTTP Status Codes ==="
get_status_distribution "$file"
echo ""
print_color "$COLOR_BOLD" "=== Top IPs ==="
get_top_ips "$file" 10 | while read -r count ip; do
printf " %-15s - %d requests\n" "$ip" "$count"
done
echo ""
print_color "$COLOR_BOLD" "=== Top URLs ==="
get_top_urls "$file" 10 | while read -r count url; do
printf " [%d] %s\n" "$count" "$url"
done
echo ""
fi
# Recent errors
print_color "$COLOR_BOLD" "=== Recent Errors (Last 10) ==="
grep -E "${ERROR_PATTERNS[*]}" "$file" 2>/dev/null | tail -10 | \
sed 's/^/ /'
echo ""
}
# Real-time monitoring
monitor_log() {
local file="$1"
if [[ ! -f "$file" ]]; then
log "ERROR: File not found: $file"
return 1
fi
log "Starting real-time monitoring of $file (Ctrl+C to stop)"
echo ""
tail -f "$file" | while IFS= read -r line; do
if [[ "$line" =~ ERROR|FATAL|CRITICAL ]]; then
print_color "$COLOR_RED" "[ERROR] $line"
elif [[ "$line" =~ WARNING|WARN ]]; then
print_color "$COLOR_YELLOW" "[WARN] $line"
else
echo "$line"
fi
done
}
# Search pattern
search_pattern() {
local pattern="$1"
local file="$2"
if [[ ! -f "$file" ]]; then
log "ERROR: File not found: $file"
return 1
fi
log "Searching for: $pattern"
local count
count=$(grep -c "$pattern" "$file" 2>/dev/null || echo "0")
echo "Found $count matches"
echo ""
grep -n "$pattern" "$file" 2>/dev/null | head -50 | \
while read -r line; do
if [[ "$line" =~ ERROR|FATAL|CRITICAL ]]; then
print_color "$COLOR_RED" "$line"
else
echo "$line"
fi
done
}
# ============================================================================
# Main Function
# ============================================================================
main() {
local command="${1:-analyze}"
shift || true
case "$command" in
analyze)
if [[ $# -lt 1 ]]; then
echo "Usage: $0 analyze <log_file>"
exit 1
fi
analyze_log "$1"
;;
monitor)
if [[ $# -lt 1 ]]; then
echo "Usage: $0 monitor <log_file>"
exit 1
fi
monitor_log "$1"
;;
search)
if [[ $# -lt 2 ]]; then
echo "Usage: $0 search <pattern> <log_file>"
exit 1
fi
search_pattern "$1" "$2"
;;
stats)
if [[ $# -lt 1 ]]; then
echo "Usage: $0 stats <log_file>"
exit 1
fi
analyze_log "$1"
;;
errors)
if [[ $# -lt 1 ]]; then
echo "Usage: $0 errors <log_file>"
exit 1
fi
grep -E "${ERROR_PATTERNS[*]}" "$1" 2>/dev/null || echo "No errors found"
;;
*)
echo "Usage: $0 {analyze|monitor|search|stats|errors} [options]"
echo ""
echo "Commands:"
echo " analyze <file> Analyze log file and show statistics"
echo " monitor <file> Real-time monitoring"
echo " search <pat> <f> Search for pattern in log"
echo " stats <file> Show detailed statistics"
echo " errors <file> Show all error lines"
exit 1
;;
esac
}
main "$@"
Terminal window
# Make executable
chmod +x log-analyzer.sh
# Analyze system log
sudo ./log-analyzer.sh analyze /var/log/syslog
# Analyze nginx access log
./log-analyzer.sh analyze /var/log/nginx/access.log
# Monitor in real-time
sudo ./log-analyzer.sh monitor /var/log/syslog
# Search for specific pattern
./log-analyzer.sh search "connection refused" /var/log/nginx/error.log
# Show all errors
sudo ./log-analyzer.sh errors /var/log/syslog

In this chapter, you completed practical exercises covering:

  • ✅ System Monitoring Dashboard - Complete production monitoring
  • ✅ Backup Automation Script - Enterprise backup solution
  • ✅ Log Analyzer - Multi-format log analysis

These exercises provide real-world DevOps experience with production-ready code.


You have completed all 46 chapters of the bash-scripting-guide! This comprehensive guide covered everything needed for DevOps, SRE, and SysAdmin positions:

  • Basic to advanced bash concepts
  • Text processing (sed, awk, regex)
  • Process management
  • Error handling
  • Security best practices
  • Performance optimization
  • Real-world automation scripts
  • Interview preparation
  • Practical exercises

Previous Chapter: Interview Questions End of Guide