Performance

Chapter 43: Performance Optimization in Bash

Overview

This chapter covers techniques for optimizing bash script performance. Understanding these techniques helps create efficient scripts for DevOps automation, CI/CD pipelines, and system administration in production environments.

Performance Basics

Understanding Performance

Performance in bash scripting involves:

┌────────────────────────────────────────────────────────────────┐
│                   Performance Optimization Areas                     │
├────────────────────────────────────────────────────────────────┤
│                                                                │
│   1. Subshell Optimization                                     │
│      - Avoid subshells in loops                               │
│      - Use built-ins instead of external commands             │
│                                                                │
│   2. I/O Optimization                                         │
│      - Reduce file operations                                 │
│      - Use efficient parsing                                  │
│      - Buffer I/O operations                                  │
│                                                                │
│   3. Algorithm Optimization                                   │
│      - Choose right data structures                           │
│      - Optimize loops                                         │
│      - Pre-compile patterns                                   │
│                                                                │
│   4. Parallel Processing                                     │
│      - Use background processes                               │
│      - Use xargs -P for parallelism                         │
│      - Consider GNU Parallel                                   │
│                                                                │
│   5. Memory Optimization                                     │
│      - Use arrays instead of strings                         │
│      - Release resources promptly                             │
│      - Avoid memory leaks in loops                            │
│                                                                │
└────────────────────────────────────────────────────────────────┘

Subshell Optimization

Avoid Subshells in Loops

#!/bin/bash
# Subshell performance comparison

# ❌ Bad - subshell created for EACH iteration
# Very slow for large datasets
for item in items; do
    result=$(process "$item")  # Each iteration = new subshell
done

# ✓ Good - process in current shell
for item in items; do
    process "$item"  # Same shell, faster
done

# ✓ Better - batch processing
process_items() {
    while read -r item; do
        process "$item"
    done < <(list_items)
}

Use Built-ins

#!/bin/bash

# ❌ Bad - external command
value=$(echo "$var" | tr '[:lower:]' '[:upper:]')

# ✓ Good - built-in
value="${var^^}"

# ❌ Bad - external command for counting
count=$(ls -1 | wc -l)

# ✓ Good - bash built-in
shopt -s nullglob
files=(*)
count=${#files[@]}

# ❌ Bad - external command
upper=$(echo "$var" | awk '{print toupper($0)}')

# ✓ Good - built-in
upper="${var^^}"

I/O Optimization

Reduce File Operations

#!/bin/bash

# ❌ Bad - multiple file opens
echo "$line1" > file.txt
echo "$line2" >> file.txt
echo "$line3" >> file.txt

# ✓ Good - single file open
{
    echo "$line1"
    echo "$line2"
    echo "$line3"
} > file.txt

# ✓ Even better - accumulate and write once
output=""
for item in items; do
    output+="$(process "$item")"$'\n'
done
echo "$output" > file.txt

Use Efficient Parsing

#!/bin/bash

# ❌ Bad - multiple grep
grep "error" log.txt | grep -v "warning" | wc -l

# ✓ Good - single grep with regex
grep -cE "error" log.txt

# ❌ Bad - awk for simple task
awk '{print $1}' file.txt

# ✓ Good - cut for simple field
cut -d' ' -f1 file.txt

# ❌ Bad - sed chain
cat file.txt | sed 's/a/b/' | sed 's/c/d/' | sort

# ✓ Good - combined sed and single pipeline
sed -e 's/a/b/' -e 's/c/d/' file.txt | sort

Data Structure Optimization

Use Arrays Instead of Strings

#!/bin/bash

# ❌ Bad - parsing string breaks on spaces
data="one two three four five"
for item in $data; do
    echo "$item"  # Breaks on spaces in items
done

# ✓ Good - use array
data=(one two three four five)
for item in "${data[@]}"; do
    echo "$item"  # Handles spaces correctly
done

# ❌ Bad - inefficient string concatenation
result=""
for item in items; do
    result="$result$item"  # Creates new string each iteration
done

# ✓ Good - use array
results=()
for item in items; do
    results+=("$(process "$item")")
done
printf '%s\n' "${results[@]}"

Use Associative Arrays

#!/bin/bash

# ❌ Bad - multiple variables
user1_name="admin"
user1_role="admin"
user2_name="john"
user2_role="user"

# ✓ Good - associative array
declare -A user_info
user_info[admin_name]="admin"
user_info[admin_role]="admin"
user_info[john_name]="john"
user_info[john_role]="user"

# Fast lookup
echo "${user_info[${username}_role]}"

Parallel Processing

Using Background Jobs

#!/bin/bash
# Run tasks in parallel

# Start all tasks in background
for item in items; do
    process "$item" &
done

# Wait for all to complete
wait

echo "All tasks completed"

Using xargs

#!/bin/bash

# Process in parallel with xargs
find . -name "*.txt" | xargs -P 4 -I {} process {}

# With custom delimiter
find . -name "*.txt" -print0 | xargs -0 -P 4 -I {} process {}

# Optimal parallelism - use CPU cores
CPU_CORES=$(nproc)
find . -name "*.txt" | xargs -P "$CPU_CORES" -I {} process {}

GNU Parallel

#!/bin/bash

# Install: sudo pacman -S parallel

# Simple parallel
ls *.txt | parallel process {}

# With more options
ls *.txt | parallel -j+0 --halt now,fail=1 process {}

# Remote execution
ls *.txt | parallel -S server1,server2 process {}

Command Optimization

Cache Command Output

#!/bin/bash

# ❌ Bad - call command multiple times
if grep -q "pattern" file1; then
    echo "Found in file1"
fi
if grep -q "pattern" file2; then
    echo "Found in file2"
fi
if grep -q "pattern" file3; then
    echo "Found in file3"
fi

# ✓ Good - cache result
result=$(grep "pattern" file1 file2 file3)
if echo "$result" | grep -q "file1"; then
    echo "Found in file1"
fi
if echo "$result" | grep -q "file2"; then
    echo "Found in file2"
fi

# Use associative array for caching
declare -A cache
get_value() {
    local key="$1"
    if [[ -z "${cache[$key]:-}" ]]; then
        cache[$key]=$(expensive_command "$key")
    fi
    echo "${cache[$key]}"
}

Pre-compile Patterns

#!/bin/bash

# ❌ Bad - compile regex each iteration
for line in "${lines[@]}"; do
    if [[ "$line" =~ ^[0-9]+$ ]]; then
        # process
    fi
done

# ✓ Good - pre-compile outside loop
regex='^[0-9]+$'
for line in "${lines[@]}"; do
    if [[ "$line" =~ $regex ]]; then
        # process
    fi
done

Pipeline Optimization

Use Built-in Pipelines

#!/bin/bash

# ❌ Bad - subshell for simple operations
result=$(cat file.txt | head -n 10 | sort | uniq)

# ✓ Good - avoid unnecessary cat
result=$(<file.txt | head -n 10 | sort -u)

# ✓ Even better - use bash directly
mapfile -t lines < file.txt
printf '%s\n' "${lines[@]:0:10}" | sort -u

# Avoid UUOC (Useless Use of Cat)
# ❌ cat file.txt | grep pattern
# ✓ grep pattern file.txt

Summary

In this chapter, you learned:

✅ Performance basics and optimization areas
✅ Subshell optimization techniques
✅ I/O optimization strategies
✅ Data structure optimization
✅ Parallel processing methods
✅ Command optimization tips
✅ Pipeline optimization
✅ Real-world DevOps examples

Next Steps

Continue to the next chapter to learn about Security Considerations.

Previous Chapter: Process Substitution Next Chapter: Security Considerations