Skip to content

Rsync and Tar

Chapter 43: rsync and tar - Essential Backup Tools

Section titled “Chapter 43: rsync and tar - Essential Backup Tools”

rsync is a powerful and versatile file synchronization tool that is widely used for backups, data mirroring, and file transfer. It uses a smart delta-transfer algorithm to minimize data transfer by only sending the differences between source and destination files.

RSYNC DELTA TRANSFER ALGORITHM
+------------------------------------------------------------------+
| |
| SOURCE FILE DESTINATION FILE |
| (Original) (Previous Backup) |
| |
| ┌────────────────────┐ ┌────────────────────┐ |
| │ A B C D E F G H I │ │ A B C D E F G H I │ │
| │ 1 2 3 4 5 6 7 8 9 │ │ 1 2 3 4 5 6 7 8 9 │ |
| └────────────────────┘ └────────────────────┘ |
| │ │ │
| │ rsync compares │ │
| │ blocks │ │
| └──────────────┬───────────────┘ |
| │ |
| ▼ |
| ┌─────────────────────────────┐ |
| │ BLOCK COMPARISON │ │
│ │ │ |
| │ Block 1: A - Match ✓ │ │
| │ Block 2: B - Match ✓ │ │
| │ Block 3: C - Changed ✗ │ │
| │ Block 4: D - Match ✓ │ │
| │ Block 5: E - New ✗ │ │
| │ │ │
| └─────────────────────────────┘ |
| │ |
| ▼ |
| ┌─────────────────────────────┐ |
| │ TRANSFER ONLY CHANGES │ |
| │ │ |
| │ ┌───────────────────────┐ │ │
| │ │ Block 3: C → X │ │ │
| │ │ Block 5: (E F G H I) │ │ │
| │ │ (new blocks) │ │ │
| │ └───────────────────────┘ │ │
| │ │ |
| │ Result: Only ~20% sent │ │
| │ │ |
| └─────────────────────────────┘ |
| |
+------------------------------------------------------------------+
CategoryOptionDescription
Basic-rRecursive
-aArchive mode (preserves permissions, timestamps, etc.)
-vVerbose output
-zCompress during transfer
-hHuman-readable output
Sync—deleteDelete files not on source
—delete-beforeDelete before transfer
—delete-afterDelete after transfer
—delete-excludedDelete excluded files from dest
Filter—excludeExclude patterns
—exclude-fromExclude patterns from file
—includeInclude patterns
—filterCustom filter rules
Performance—bwlimitBandwidth limit (KB/s)
-PShow progress (—partial —progress)
—partialKeep partial files
—checksumUse checksums, not time/size
Remote-eSpecify remote shell
—rsync-pathRemote rsync path
Terminal window
# =============================================================================
# LOCAL FILE SYNCHRONIZATION
# =============================================================================
# Basic sync - mirror source to destination
rsync -av /source/ /destination/
# Mirror with deletion (remove files in destination not in source)
rsync -av --delete /source/ /destination/
# Sync specific files
rsync -av /source/file1.txt /source/file2.txt /destination/
# Sync directory tree but not files
rsync -av --no-files --omit-dir-times /source/ /destination/
# Sync only the directory structure
rsync -av -f '+ */' -f '- *' /source/ /destination/
# =============================================================================
# EXCLUDING FILES AND DIRECTORIES
# =============================================================================
# Exclude single pattern
rsync -av --exclude='*.log' /source/ /destination/
# Exclude multiple patterns
rsync -av \
--exclude='*.log' \
--exclude='*.tmp' \
--exclude='.cache' \
/source/ /destination/
# Exclude patterns from file
# Create exclude.txt
cat > /tmp/exclude.txt <<'EOF'
*.log
*.tmp
.cache/
.git/
node_modules/
__pycache__/
*.pyc
EOF
rsync -av --exclude-from='/tmp/exclude.txt' /source/ /destination/
# Exclude all but specific files
rsync -av --include='*/' --include='*.txt' --exclude='*' /source/ /destination/
# Complex filtering
rsync -av \
--include='dir1/' \
--include='dir1/*.txt' \
--exclude='*' \
/source/ /destination/
# =============================================================================
# DRY RUN AND PREVIEW
# =============================================================================
# Preview what would be transferred (no changes)
rsync -avn /source/ /destination/
# Show what would be deleted
rsync -avn --delete /source/ /destination/
# Verbose preview with file list
rsync -avn --itemize-changes /source/ /destination/
# Preview with checksum comparison
rsync -avnc /source/ /destination/
# =============================================================================
# PERFORMANCE OPTIMIZATION
# =============================================================================
# Compress during transfer (good for slow networks)
rsync -avz /source/ /destination/
# Skip based on checksum (more accurate but slower)
rsync -avc /source/ /destination/
# Skip based on size
rsync -av --max-size=100M /source/ /destination/
rsync -av --min-size=1k /source/ /destination/
# Bandwidth limit (KB/s)
rsync -avz --bwlimit=1000 /source/ /destination/
# Parallel transfers (for multiple files)
rsync -avz --parallel=4 /source/ /destination/
# Preserve partial files (resume interrupted transfer)
rsync -avP /source/ /destination/
# =============================================================================
# PRESERVING ATTRIBUTES
# =============================================================================
# Archive mode (preserves almost everything)
rsync -av /source/ /destination/
# Preserve permissions
rsync -avp /source/ /destination/
# Preserve times
rsync -avt /source/ /destination/
# Preserve owner and group
rsync -avgo /source/ /destination/
# Preserve ACLs (if supported)
rsync -avA /source/ /destination/
# Preserve extended attributes
rsync -avX /source/ /destination/
# All attributes combined
rsync -avAX /source/ /destination/
# Preserve hard links
rsync -avH /source/ /destination/
# Preserve devices and special files
rsync -avD /source/ /destination/

The rsync daemon mode allows rsync to run as a service, providing better performance and easier access control for backup servers.

RSYNC DAEMON ARCHITECTURE
+------------------------------------------------------------------+
| |
| CLIENT RSYNC DAEMON |
| SYSTEM TCP/873 SERVER |
| |
| ┌─────────┐ ┌─────────────┐ |
| │ rsync │────────────────────────────→│ rsyncd │ |
| │ client │ Connection │ daemon │ │
| │ │ (encrypted via │ │ │
| │ │ SSH if needed) │ [backup] │ │
| └─────────┘ │ path=/backup │
| │ │ │
| CONFIG: │ [webroot] │ │
| /etc/rsyncd.conf │ path=/var/www │
| └─────────────┘ │
| |
+------------------------------------------------------------------+
/etc/rsyncd.conf
# Global settings
uid = nobody
gid = nobody
use chroot = yes
max connections = 10
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
log file = /var/log/rsyncd.log
timeout = 300
# Module definitions
[backup]
path = /backup
comment = Main backup directory
read only = false
write only = false
list = true
auth users = backup_user
secrets file = /etc/rsyncd.secrets
hosts allow = 192.168.1.0/24
hosts deny = *
exclude = .cache/ tmp/ temp/
[webroot]
path = /var/www
comment = Web server files
read only = true
list = false
auth users = web_backup
secrets file = /etc/rsyncd.secrets
exclude = .git/ node_modules/
[database]
path = /var/lib/mysql
comment = MySQL database files
read only = true
list = false
auth users = db_backup
secrets file = /etc/rsyncd.secrets
exclude = mysql/ tmp/
Terminal window
# /etc/rsyncd.secrets (chmod 600)
backup_user:backup_password123
web_backup:web_pass456
db_backup:db_pass789
Terminal window
# Start rsync daemon
sudo systemctl enable rsyncd
sudo systemctl start rsyncd
# Connect to rsync daemon
rsync -av /data/ rsync://backup_user@localhost/backup/
# Connect with password
rsync -av /data/ rsync://backup_user@backup.example.com/backup/
# Use SSH tunnel (more secure)
rsync -av -e ssh /data/ backup_user@backup.example.com:/backup/

tar (Tape Archive) is one of the most fundamental Unix/Linux utilities for creating archive files. It can combine multiple files and directories into a single archive file, with optional compression.

TAR OPERATION MODES
+------------------------------------------------------------------+
| |
| ┌─────────────────────────────────────────────────────────┐ │
| │ CREATE MODE (-c) │ │
| │ │ │
| │ file1.txt ─┐ │ │
| │ file2.txt ─┼──→ tar ──→ archive.tar.gz │ │
| │ dir/ ─┘ │ │
| │ │ │
| └─────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────┐ │
| │ EXTRACT MODE (-x) │ │
| │ │ │
| │ archive.tar.gz ──→ tar ──→ file1.txt │ │
| │ ──→ file2.txt │ │
| │ ──→ dir/ │ │
| │ │ │
| └─────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────┐ │
| │ LIST MODE (-t) │ │
| │ │ │
| │ archive.tar.gz ──→ tar ──→ Shows file list │ │
| │ │ │
| └─────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+
COMPRESSION COMPARISON
+------------------------------------------------------------------+
| |
| Algorithm Extension Compression Decompression Use |
| ───────── ───────── ────────── ───────────── ──── │
| none .tar 0% Fastest Speed |
| gzip .tar.gz ~70% Fast Common │
| bzip2 .tar.bz2 ~75% Medium Better │
| xz .tar.xz ~80% Slow Best │
| lzma .tar.lzma ~82% Slowest Max │
| |
| TIME COMPARISON (1GB data): |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ gzip: ████████████░░░░░░░░░░░░░ 10s compress │ │
│ │ bzip2: ██████████████████░░░░░░ 20s compress │ │
│ │ xz: ██████████████████████████ 45s compress │ │
│ │ │ │
│ │ gzip: ████████░░░░░░░░░░░░░░░░ 5s decompress │ │
│ │ bzip2: ████████████░░░░░░░░░░░░░ 10s decompress │ │
│ │ xz: █████████████████░░░░░░░ 15s decompress │ │
│ └─────────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+
Terminal window
# =============================================================================
# CREATE BASIC ARCHIVES
# =============================================================================
# Create uncompressed tar
tar -cvf archive.tar /directory/
# Create gzip compressed archive
tar -cvzf archive.tar.gz /directory/
# Create bzip2 compressed archive
tar -cvjf archive.tar.bz2 /directory/
# Create xz compressed archive (best compression)
tar -cvJf archive.tar.xz /directory/
# Create archive with timestamp in filename
tar -cvzf "backup-$(date +%Y%m%d).tar.gz" /directory/
# Create archive with custom name
tar -cvzf /backup/website-$(hostname)-$(date +%Y%m%d).tar.gz /var/www/
# =============================================================================
# LIST ARCHIVE CONTENTS
# =============================================================================
# List all files
tar -tvf archive.tar.gz
# List with details (permissions, size, date)
tar -tvf archive.tar.gz | less
# List specific directory
tar -tvf archive.tar.gz | grep 'path/to/dir'
# List only file names
tar -tf archive.tar.gz
# =============================================================================
# EXTRACT ARCHIVES
# =============================================================================
# Extract to current directory
tar -xvzf archive.tar.gz
# Extract to specific directory
tar -xvzf archive.tar.gz -C /restore/
# Extract specific file
tar -xvzf archive.tar.gz -C /restore/ path/to/file.txt
# Extract with full path preserved
tar -xvzf archive.tar.gz -C /
# Extract without overwriting newer files
tar -xvzkf archive.tar.gz -C /restore/
# Extract only newer files
tar -xvuf archive.tar.gz
# =============================================================================
# WORKING WITH PATTERNS
# =============================================================================
# Extract only .txt files
tar -xvzf archive.tar.gz --wildcards '*.txt'
# Extract only files in specific directory
tar -xvzf archive.tar.gz -C /restore/ 'var/log/*'
# List only .conf files
tar -tvf archive.tar.gz --wildcards '*.conf'
# =============================================================================
# INCREMENTAL BACKUPS WITH TAR
# =============================================================================
# First full backup
tar -cvzf full-backup.tar.gz \
--listed-incremental=/backup/snapshot.snar \
/data/
# Second incremental backup (only changes since full)
tar -cvzf incremental-1.tar.gz \
--listed-incremental=/backup/snapshot.snar \
/data/
# Third incremental backup
tar -cvzf incremental-2.tar.gz \
--listed-incremental=/backup/snapshot.snar \
/data/
# RESTORE: First restore full, then incremental in order
tar -xvzf full-backup.tar.gz --listed-incremental=/dev/null
tar -xvzf incremental-1.tar.gz --listed-incremental=/dev/null
tar -xvzf incremental-2.tar.gz --listed-incremental=/dev/null
Terminal window
# =============================================================================
# EXCLUDE FILES
# =============================================================================
# Exclude specific patterns
tar -cvzf backup.tar.gz \
--exclude='*.log' \
--exclude='*.tmp' \
--exclude='.cache' \
/data/
# Exclude from file
tar -cvzf backup.tar.gz -X exclude-list.txt /data/
# Exclude directories
tar -cvzf backup.tar.gz \
--exclude='/data/tmp' \
--exclude='/data/cache' \
/data/
# Include only specific patterns
tar -cvzf backup.tar.gz \
--include='*.conf' \
--include='*.sh' \
-X exclude.txt \
/data/
# =============================================================================
# SPLIT AND JOIN ARCHIVES
# =============================================================================
# Split into multiple files (100MB each)
tar -cvzf - /data/ | split -b 100M - backup.tar.gz.
# Restore from split files
cat backup.tar.gz.* | tar -xvzf -
# Split with custom prefix
tar -cvzf - /data/ | split -b 100M -D backup_part_
# =============================================================================
# VERIFY ARCHIVES
# =============================================================================
# Test archive integrity
tar -tzf archive.tar.gz >/dev/null && echo "Archive OK"
# Verify files in archive against filesystem
tar -dvf archive.tar.gz
# Check compressed archive
gzip -t archive.tar.gz && echo "Gzip OK"
bzip2 -t archive.tar.bz2 && echo "Bzip2 OK"
# =============================================================================
# PRESERVE ATTRIBUTES
# =============================================================================
# Preserve all attributes
tar -cvpzf backup.tar.gz /data/
# Preserve ACLs (if available)
tar -cvpaf backup.tar.gz /data/
# Preserve SELinux contexts
tar -cvpOf backup.tar.gz /data/ | tar -xvf - --selinux

#!/bin/bash
#===============================================================================
# Daily Backup Script
# Uses rsync for incremental and tar for archive
#===============================================================================
set -euo pipefail
# Configuration
BACKUP_ROOT="/backup"
SOURCE_DIRS=(
"/home"
"/etc"
"/var/www"
"/opt"
)
REMOTE_HOST="backup.example.com"
REMOTE_USER="backup"
REMOTE_PATH="/backups/$(hostname)"
RETENTION_DAYS=30
LOG_FILE="/var/log/backup.log"
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log_info() {
echo -e "${GREEN}[$(date '+%Y-%m-%d %H:%M:%S')] INFO:${NC} $1" | tee -a "$LOG_FILE"
}
log_warn() {
echo -e "${YELLOW}[$(date '+%Y-%m-%d %H:%M:%S')] WARN:${NC} $1" | tee -a "$LOG_FILE"
}
log_error() {
echo -e "${RED}[$(date '+%Y-%m-%d %H:%M:%S')] ERROR:${NC} $1" | tee -a "$LOG_FILE"
}
# Create backup directories
setup_directories() {
local timestamp
timestamp=$(date +%Y%m%d_%H%M%S)
BACKUP_DATE_DIR="${BACKUP_ROOT}/${timestamp}"
mkdir -p "$BACKUP_DATE_DIR"/{rsync,tar}
echo "$BACKUP_DATE_DIR"
}
# Rsync backup
rsync_backup() {
local source_dir="$1"
local backup_dir="$2"
local dirname
dirname=$(basename "$source_dir")
log_info "Running rsync for: $source_dir"
rsync -avz \
--delete \
--delete-excluded \
--exclude='*.log' \
--exclude='*.tmp' \
--exclude='.cache' \
--exclude='node_modules' \
--exclude='.git' \
--progress \
"$source_dir/" "$backup_dir/${dirname}/"
log_info "Rsync completed for: $source_dir"
}
# Tar archive backup
tar_backup() {
local source_dir="$1"
local backup_dir="$2"
local dirname
dirname=$(basename "$source_dir")
local timestamp
timestamp=$(date +%Y%m%d)
log_info "Creating tar archive for: $source_dir"
tar -cvzf "${backup_dir}/${dirname}_${timestamp}.tar.gz" \
--exclude='*.log' \
--exclude='*.tmp' \
--exclude='.cache' \
"$source_dir" 2>&1 | tee -a "$LOG_FILE"
log_info "Tar archive created: ${backup_dir}/${dirname}_${timestamp}.tar.gz"
}
# Remote sync
remote_sync() {
log_info "Syncing to remote server..."
rsync -avz \
-e "ssh -o StrictHostKeyChecking=no" \
--delete \
"${BACKUP_ROOT}/" \
"${REMOTE_USER}@${REMOTE_HOST}:${REMOTE_PATH}/"
log_info "Remote sync completed"
}
# Cleanup old backups
cleanup() {
log_info "Cleaning up backups older than $RETENTION_DAYS days..."
# Find and delete old rsync directories
find "$BACKUP_ROOT" -maxdepth 1 -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \; 2>/dev/null || true
# Find and delete old tar files
find "$BACKUP_ROOT" -name "*.tar.gz" -mtime +$RETENTION_DAYS -delete 2>/dev/null || true
log_info "Cleanup completed"
}
# Verify backup
verify_backup() {
local backup_dir="$1"
log_info "Verifying backup: $backup_dir"
# Check directory exists and has content
if [ -d "$backup_dir" ]; then
local size
size=$(du -sh "$backup_dir" | cut -f1)
log_info "Backup verified: $backup_dir (Size: $size)"
return 0
else
log_error "Backup verification failed: $backup_dir"
return 1
fi
}
# Main function
main() {
local start_time
start_time=$(date +%s)
log_info "=========================================="
log_info "Starting backup process"
log_info "=========================================="
# Setup
BACKUP_DATE_DIR=$(setup_directories)
# Rsync backups
for dir in "${SOURCE_DIRS[@]}"; do
if [ -d "$dir" ]; then
rsync_backup "$dir" "${BACKUP_DATE_DIR}/rsync" || log_warn "Rsync failed for: $dir"
else
log_warn "Directory not found: $dir"
fi
done
# Tar backups
for dir in "${SOURCE_DIRS[@]}"; do
if [ -d "$dir" ]; then
tar_backup "$dir" "${BACKUP_DATE_DIR}/tar" || log_warn "Tar failed for: $dir"
fi
done
# Verify
verify_backup "$BACKUP_DATE_DIR" || true
# Remote sync
if [ -n "$REMOTE_HOST" ]; then
remote_sync || log_warn "Remote sync failed"
fi
# Cleanup
cleanup
# Summary
local end_time
end_time=$(date +%s)
local duration=$((end_time - start_time))
log_info "=========================================="
log_info "Backup completed in ${duration} seconds"
log_info "Backup location: $BACKUP_DATE_DIR"
log_info "=========================================="
}
main "$@"
#!/bin/bash
#===============================================================================
# Incremental Backup Script using rsync with hard links
# Preserves space by using hard links for unchanged files
#===============================================================================
set -euo pipefail
# Configuration
SOURCE="/data"
BACKUP_ROOT="/backup/incremental"
REMOTE="backup@remote.example.com:/backups"
RETENTION_DAYS=14
LOG="/var/log/incremental_backup.log"
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a "$LOG"
}
# Create backup directory with timestamp
backup_dir() {
local timestamp
timestamp=$(date +%Y%m%d_%H%M)
echo "${BACKUP_ROOT}/backup_${timestamp}"
}
# Get previous backup
previous_backup() {
ls -td "${BACKUP_ROOT}"/backup_* 2 /dev/null | head -1
}
# Main backup function
main() {
local current_backup
current_backup=$(backup_dir)
local previous_backup
previous_backup=$(previous_backup)
log "Starting incremental backup..."
log "Current backup: $current_backup"
# Create current backup directory
mkdir -p "$current_backup"
# If previous backup exists, link to it for hard links
if [ -n "$previous_backup" ] && [ -d "$previous_backup" ]; then
log "Using previous backup for hard links: $previous_backup"
# Use rsync with link-dest to create incremental backup
rsync -avh \
--link-dest="$previous_backup" \
--delete \
--delete-excluded \
--progress \
"$SOURCE/" "$current_backup/"
else
# First backup - full copy
log "First backup - full copy"
rsync -avh \
--delete \
--delete-excluded \
--progress \
"$SOURCE/" "$current_backup/"
fi
# Get backup size
local size
size=$(du -sh "$current_backup" | cut -f1)
log "Backup completed. Size: $size"
# Sync to remote
if [ -n "$REMOTE" ]; then
log "Syncing to remote: $REMOTE"
rsync -avh -e ssh --delete "$current_backup/" "$REMOTE/current/"
fi
# Cleanup old backups
log "Cleaning up backups older than $RETENTION_DAYS days"
find "$BACKUP_ROOT" -maxdepth 1 -type d -mtime +$RETENTION_DAYS -exec rm -rf {} \;
log "Incremental backup completed successfully"
}
main "$@"

Terminal window
# =============================================================================
# SSH KEY SETUP FOR BACKUP
# =============================================================================
# Generate dedicated backup key (no passphrase for automation)
ssh-keygen -t ed25519 -f ~/.ssh/backup_key -N ""
# Or use RSA for wider compatibility
ssh-keygen -t rsa -b 4096 -f ~/.ssh/backup_key -N ""
# Copy public key to backup server
ssh-copy-id -i ~/.ssh/backup_key.pub backup@backup-server
# Or manually
cat ~/.ssh/backup_key.pub | ssh backup@backup-server "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"
# Set proper permissions
chmod 600 ~/.ssh/backup_key
chmod 700 ~/.ssh
# Test connection
ssh -i ~/.ssh/backup_key backup@backup-server "echo 'Connection OK'"
#!/bin/bash
#===============================================================================
# Remote Backup via rsync over SSH
#===============================================================================
# Configuration
SOURCE_DIRS=("/home" "/etc" "/var/www")
REMOTE_USER="backup"
REMOTE_HOST="backup.example.com"
REMOTE_BASE="/backups/$(hostname)"
SSH_KEY="/root/.ssh/backup_key"
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=10"
log() {
echo "[$(date)] $1"
}
# Backup each directory
for dir in "${SOURCE_DIRS[@]}"; do
dirname=$(basename "$dir")
timestamp=$(date +%Y%m%d_%H%M)
log "Backing up $dir to $REMOTE_HOST:$REMOTE_BASE/${dirname}_${timestamp}/"
rsync -avz $SSH_OPTS \
-e "ssh -i $SSH_KEY" \
--delete \
--delete-excluded \
--exclude='*.log' \
--exclude='.cache' \
--progress \
"$dir/" \
"$REMOTE_USER@$REMOTE_HOST:$REMOTE_BASE/${dirname}_${timestamp}/"
# Create symlink to latest
ssh $SSH_OPTS -i $SSH_KEY "$REMOTE_USER@$REMOTE_HOST" \
"ln -sfn ${dirname}_${timestamp} $REMOTE_BASE/${dirname}_latest"
done
log "Remote backup completed"

WHEN TO USE RSYNC VS TAR
+------------------------------------------------------------------+
| |
| START │
| │ │
| ▼ │
| ┌─────────────────────────────┐ │
| │ Need real-time or │ │
| │ frequent sync? │ │
| └─────────────────────────────┘ │
│ │ │ │
│ YES │ │ NO │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
| │ RSYNC │ │ Need │ │
| │ │ │ archive │ │
| └─────────────┘ └─────────────┘ │
│ │ │ │
│ YES │ │ NO │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ TAR │ │ OTHER │ │
│ │ │ │ TOOLS │ │
│ └──────────┘ └──────────┘ │
| |
+------------------------------------------------------------------+
RSYNC AND TAR BEST PRACTICES CHECKLIST
+------------------------------------------------------------------+
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ RSYNC │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Always use --dry-run before first sync │ │
| │ □ Use --checksum for critical data │ │
| │ □ Implement --delete carefully (can cause data loss) │ │
| │ □ Use -e ssh for encrypted transfer │ │
| │ □ Set up SSH key authentication for automation │ │
| │ □ Use --partial for large file transfers │ │
| │ □ Monitor bandwidth with --bwlimit │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ TAR │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Test archive integrity after creation │ │
| │ □ Use -p to preserve permissions │ │
| │ □ Choose right compression (gzip for speed, xz for size) │ │
| │ □ Document exclude patterns │ │
| │ □ Implement incremental backups for large datasets │ │
| │ □ Store snapshot files securely │ │
| │ □ Test restore procedures regularly │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
| ┌─────────────────────────────────────────────────────────────┐ │
| │ SECURITY │ │
| ├─────────────────────────────────────────────────────────────┤ │
| │ □ Encrypt sensitive backups │ │
| │ □ Secure transfer with SSH │ │
| │ □ Protect backup files with proper permissions │ │
| │ □ Store backup keys securely │ │
| │ □ Implement offsite backup │ │
| └─────────────────────────────────────────────────────────────┘ │
| |
+------------------------------------------------------------------+

rsync and tar are fundamental backup tools in DevOps:

rsync and tar in DevOps Backup
+------------------------------------------------------------------+
| |
| Incremental Backups: |
| +----------------------------------------------------------+ |
| | rsync -> Only transfers changed files | |
| | tar -g -> GNU incremental snapshots | |
| | Efficient bandwidth usage | |
| +----------------------------------------------------------+ |
| |
| Cloud Integration: |
| +----------------------------------------------------------+ |
| | rsync to S3 -> rclone | |
| | rsync to Azure Blob -> azcopy | |
| | rsync to Google Cloud -> gsutil | |
| +----------------------------------------------------------+ |
| |
| Container Backups: |
| +----------------------------------------------------------+ |
| | Docker volumes -> tar backup | |
| | Kubernetes -> Velero for volume snapshots | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+

Practical Impact:

  • Efficient incremental backups saving bandwidth
  • Versatile backup solution for various scenarios
  • Foundation for cloud-native backup strategies

Terminal window
# WRONG: Creates subdirectory
rsync -av /source/ /dest
# Creates /dest/source/...
# CORRECT: Without trailing slash for source
rsync -av /source /dest
# Copies contents to /dest/...
Terminal window
# WRONG: Over-compressing already compressed files
rsync -avz /data/*.zip /backup
# CORRECT: Skip compression for compressed files
rsync -av --skip-compress=zip,gz,mp4 /data /backup
Terminal window
# WRONG: Never testing tar files
tar -cvf backup.tar /data
# Corrupted archive not detected until restore
# CORRECT: Test after creation
tar -cvf backup.tar /data && tar -tvf backup.tar > /dev/null

  1. What is the difference between rsync and scp?
  2. How does rsync’s delta algorithm work?
  3. What are the advantages of tar over rsync?
  4. How do you create incremental backups with tar?
  5. Explain rsync bandwidth limiting.


End of Chapter 43: rsync and tar