Skip to content

Performance

Chapter 43: Performance Optimization in Bash

Section titled “Chapter 43: Performance Optimization in Bash”

This chapter covers techniques for optimizing bash script performance. Understanding these techniques helps create efficient scripts for DevOps automation, CI/CD pipelines, and system administration in production environments.


Performance in bash scripting involves:

┌────────────────────────────────────────────────────────────────┐
│ Performance Optimization Areas │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. Subshell Optimization │
│ - Avoid subshells in loops │
│ - Use built-ins instead of external commands │
│ │
│ 2. I/O Optimization │
│ - Reduce file operations │
│ - Use efficient parsing │
│ - Buffer I/O operations │
│ │
│ 3. Algorithm Optimization │
│ - Choose right data structures │
│ - Optimize loops │
│ - Pre-compile patterns │
│ │
│ 4. Parallel Processing │
│ - Use background processes │
│ - Use xargs -P for parallelism │
│ - Consider GNU Parallel │
│ │
│ 5. Memory Optimization │
│ - Use arrays instead of strings │
│ - Release resources promptly │
│ - Avoid memory leaks in loops │
│ │
└────────────────────────────────────────────────────────────────┘

#!/bin/bash
# Subshell performance comparison
# ❌ Bad - subshell created for EACH iteration
# Very slow for large datasets
for item in items; do
result=$(process "$item") # Each iteration = new subshell
done
# ✓ Good - process in current shell
for item in items; do
process "$item" # Same shell, faster
done
# ✓ Better - batch processing
process_items() {
while read -r item; do
process "$item"
done < <(list_items)
}
#!/bin/bash
# ❌ Bad - external command
value=$(echo "$var" | tr '[:lower:]' '[:upper:]')
# ✓ Good - built-in
value="${var^^}"
# ❌ Bad - external command for counting
count=$(ls -1 | wc -l)
# ✓ Good - bash built-in
shopt -s nullglob
files=(*)
count=${#files[@]}
# ❌ Bad - external command
upper=$(echo "$var" | awk '{print toupper($0)}')
# ✓ Good - built-in
upper="${var^^}"

#!/bin/bash
# ❌ Bad - multiple file opens
echo "$line1" > file.txt
echo "$line2" >> file.txt
echo "$line3" >> file.txt
# ✓ Good - single file open
{
echo "$line1"
echo "$line2"
echo "$line3"
} > file.txt
# ✓ Even better - accumulate and write once
output=""
for item in items; do
output+="$(process "$item")"$'\n'
done
echo "$output" > file.txt
#!/bin/bash
# ❌ Bad - multiple grep
grep "error" log.txt | grep -v "warning" | wc -l
# ✓ Good - single grep with regex
grep -cE "error" log.txt
# ❌ Bad - awk for simple task
awk '{print $1}' file.txt
# ✓ Good - cut for simple field
cut -d' ' -f1 file.txt
# ❌ Bad - sed chain
cat file.txt | sed 's/a/b/' | sed 's/c/d/' | sort
# ✓ Good - combined sed and single pipeline
sed -e 's/a/b/' -e 's/c/d/' file.txt | sort

#!/bin/bash
# ❌ Bad - parsing string breaks on spaces
data="one two three four five"
for item in $data; do
echo "$item" # Breaks on spaces in items
done
# ✓ Good - use array
data=(one two three four five)
for item in "${data[@]}"; do
echo "$item" # Handles spaces correctly
done
# ❌ Bad - inefficient string concatenation
result=""
for item in items; do
result="$result$item" # Creates new string each iteration
done
# ✓ Good - use array
results=()
for item in items; do
results+=("$(process "$item")")
done
printf '%s\n' "${results[@]}"
#!/bin/bash
# ❌ Bad - multiple variables
user1_name="admin"
user1_role="admin"
user2_name="john"
user2_role="user"
# ✓ Good - associative array
declare -A user_info
user_info[admin_name]="admin"
user_info[admin_role]="admin"
user_info[john_name]="john"
user_info[john_role]="user"
# Fast lookup
echo "${user_info[${username}_role]}"

#!/bin/bash
# Run tasks in parallel
# Start all tasks in background
for item in items; do
process "$item" &
done
# Wait for all to complete
wait
echo "All tasks completed"
#!/bin/bash
# Process in parallel with xargs
find . -name "*.txt" | xargs -P 4 -I {} process {}
# With custom delimiter
find . -name "*.txt" -print0 | xargs -0 -P 4 -I {} process {}
# Optimal parallelism - use CPU cores
CPU_CORES=$(nproc)
find . -name "*.txt" | xargs -P "$CPU_CORES" -I {} process {}
#!/bin/bash
# Install: sudo pacman -S parallel
# Simple parallel
ls *.txt | parallel process {}
# With more options
ls *.txt | parallel -j+0 --halt now,fail=1 process {}
# Remote execution
ls *.txt | parallel -S server1,server2 process {}

#!/bin/bash
# ❌ Bad - call command multiple times
if grep -q "pattern" file1; then
echo "Found in file1"
fi
if grep -q "pattern" file2; then
echo "Found in file2"
fi
if grep -q "pattern" file3; then
echo "Found in file3"
fi
# ✓ Good - cache result
result=$(grep "pattern" file1 file2 file3)
if echo "$result" | grep -q "file1"; then
echo "Found in file1"
fi
if echo "$result" | grep -q "file2"; then
echo "Found in file2"
fi
# Use associative array for caching
declare -A cache
get_value() {
local key="$1"
if [[ -z "${cache[$key]:-}" ]]; then
cache[$key]=$(expensive_command "$key")
fi
echo "${cache[$key]}"
}
#!/bin/bash
# ❌ Bad - compile regex each iteration
for line in "${lines[@]}"; do
if [[ "$line" =~ ^[0-9]+$ ]]; then
# process
fi
done
# ✓ Good - pre-compile outside loop
regex='^[0-9]+$'
for line in "${lines[@]}"; do
if [[ "$line" =~ $regex ]]; then
# process
fi
done

#!/bin/bash
# ❌ Bad - subshell for simple operations
result=$(cat file.txt | head -n 10 | sort | uniq)
# ✓ Good - avoid unnecessary cat
result=$(<file.txt | head -n 10 | sort -u)
# ✓ Even better - use bash directly
mapfile -t lines < file.txt
printf '%s\n' "${lines[@]:0:10}" | sort -u
# Avoid UUOC (Useless Use of Cat)
# ❌ cat file.txt | grep pattern
# ✓ grep pattern file.txt

In this chapter, you learned:

  • ✅ Performance basics and optimization areas
  • ✅ Subshell optimization techniques
  • ✅ I/O optimization strategies
  • ✅ Data structure optimization
  • ✅ Parallel processing methods
  • ✅ Command optimization tips
  • ✅ Pipeline optimization
  • ✅ Real-world DevOps examples

Continue to the next chapter to learn about Security Considerations.


Previous Chapter: Process Substitution Next Chapter: Security Considerations