Awk
Chapter 22: awk - Pattern Scanning and Processing
Section titled “Chapter 22: awk - Pattern Scanning and Processing”Overview
Section titled “Overview”awk is a powerful text processing language designed for pattern scanning and reporting. It processes files line by line, splitting each line into fields, making it ideal for log analysis, data extraction, and reporting in DevOps environments.
How awk Works
Section titled “How awk Works”┌────────────────────────────────────────────────────────────────┐│ awk Processing Flow │├────────────────────────────────────────────────────────────────┤│ ││ Input File ││ ───────── ││ ││ line 1: field1 field2 field3 field4 ││ ↓ ↓ ↓ ↓ ││ $1 $2 $3 $4 ... $NF (number of fields) ││ ││ ┌─────────────────────────────────────────┐ ││ │ BEGIN { } - Run once before │ ││ │ processing starts │ ││ │ │ ││ │ /pattern/ { } - Process matching │ ││ │ lines │ ││ │ │ ││ │ { } - Process all lines │ ││ │ │ ││ │ END { } - Run once after │ ││ │ processing ends │ ││ └─────────────────────────────────────────┘ ││ ││ Output ││ ────── ││ │└────────────────────────────────────────────────────────────────┘Basic Syntax
Section titled “Basic Syntax”# Basic syntaxawk 'pattern { action }' file.txt
# Using -F for field separatorawk -F',' '{ print $1 }' file.csv
# Using -v for variablesawk -v name=value '{ print name, $1 }' file.txt
# Using -f for program fileawk -f script.awk file.txtField Variables
Section titled “Field Variables”$0 vs $1, $2, …
Section titled “$0 vs $1, $2, …”# $0 - entire lineecho "hello world" | awk '{print $0}'# hello world
# $1 - first fieldecho "hello world" | awk '{print $1}'# hello
# $2 - second fieldecho "hello world" | awk '{print $2}'# world
# $NF - last fieldecho "one two three four" | awk '{print $NF}'# four
# $(NF-1) - second to lastecho "one two three four" | awk '{print $(NF-1)}'# threeNumber of Fields (NF)
Section titled “Number of Fields (NF)”# Print number of fieldsecho "a b c d e" | awk '{print NF}'# 5
# Print last fieldecho "a b c d e" | awk '{print $NF}'# eField Separator
Section titled “Field Separator”Default Field Separator
Section titled “Default Field Separator”# Default is whitespaceecho "one two three" | awk '{print $1, $2}'# one two
# Multiple spaces handled correctlyecho "one two three" | awk '{print $1, $2}'# one twoCustom Field Separator
Section titled “Custom Field Separator”# Using -Fawk -F',' '{print $1, $2}' file.csv
# Using FS in BEGINawk 'BEGIN {FS=","} {print $1, $2}' file.csv
# Regex separatorawk 'BEGIN {FS="[:,|]"} {print $1, $2}' file.txt
# Output separatorawk 'BEGIN {OFS=" - "} {print $1, $2}' file.txtPatterns
Section titled “Patterns”Relational Patterns
Section titled “Relational Patterns”# Lines where $1 equals "error"awk '$1 == "error" {print}' logfile
# Lines where $3 > 100awk '$3 > 100 {print}' data.txt
# Lines where $1 contains "warning"awk '$1 ~ /warning/ {print}' logfile
# Lines where $1 does NOT contain "debug"awk '$1 !~ /debug/ {print}' logfilePattern Examples
Section titled “Pattern Examples”# Match lines containing "ERROR"awk '/ERROR/ {print}' logfile
# Match lines starting with "2024"awk '/^2024/ {print}' logfile
# Match lines ending with "failed"awk '/failed$/ {print}' logfile
# Complex patternsawk '/ERROR/ || /WARN/ {print}' logfileBuilt-in Variables
Section titled “Built-in Variables”Common Variables
Section titled “Common Variables”| Variable | Description |
|---|---|
FS | Input field separator |
OFS | Output field separator |
RS | Input record separator |
ORS | Output record separator |
NF | Number of fields in current record |
NR | Current record number |
FNR | Record number in current file |
FILENAME | Current filename |
ARGC | Number of arguments |
ARGV | Array of arguments |
Examples
Section titled “Examples”# Print line numbersawk '{print NR, $0}' file.txt
# Print with custom field separatorawk 'BEGIN {FS=","; OFS=" | "} {print $1, $2}' file.csv
# Different separators for input/outputawk 'BEGIN {RS="\n\n"; ORS="\n\n"} {print}' file.txtActions and Expressions
Section titled “Actions and Expressions”Print Statement
Section titled “Print Statement”# Print entire lineawk '{print}' file.txt
# Print specific fieldsawk '{print $1, $3}' file.txt
# Print with textawk '{print "User:", $1, "Status:", $2}' file.txt
# Print without newlineawk '{printf "%s ", $1}' file.txtPrintf Statement
Section titled “Printf Statement”# Formatted outputawk '{printf "%-10s %5d\n", $1, $2}' file.txt
# Numbersawk '{printf "%.2f\n", $3}' file.txt
# Align columnsawk '{printf "%-20s %10s %10s\n", $1, $2, $3}' file.txtVariables and Operators
Section titled “Variables and Operators”Arithmetic Operators
Section titled “Arithmetic Operators”# Additionawk '{print $1 + $2}' file.txt
# Multiplicationawk '{print $1 * $2}' file.txt
# Divisionawk '{print $1 / $2}' file.txt
# Moduloawk '{print $1 % $2}' file.txtString Operators
Section titled “String Operators”# Concatenationawk '{print $1 "-" $2}' file.txt
# String lengthawk '{print length($1)}' file.txt
# Substringawk '{print substr($1, 1, 5)}' file.txtControl Flow
Section titled “Control Flow”If-Else
Section titled “If-Else”awk '{ if ($2 > 50) { print "High:", $0 } else { print "Low:", $0 }}' file.txt
# One-linerawk '{if ($2 > 50) print "High"; else print "Low"}' file.txt# For loopawk '{ for (i=1; i<=NF; i++) { print $i }}' file.txt
# While loopawk '{ i=1 while (i<=NF) { print $i i++ }}' file.txtArrays
Section titled “Arrays”Basic Arrays
Section titled “Basic Arrays”# Count occurrencesawk '{count[$1]++} END {for (word in count) print word, count[word]}' file.txt
# Sum valuesawk '{sum+=$1} END {print sum}' file.txt
# Averageawk '{sum+=$1; count++} END {print sum/count}' file.txtAssociative Arrays
Section titled “Associative Arrays”# Using string keysawk '{ip_count[$1]++} END {for (ip in ip_count) print ip, ip_count[ip]}' access.log
# Multiple keysawk '{(($1,$2)):count++} END {for (k in count) print k, count[k]}' file.txtFunctions
Section titled “Functions”Built-in Functions
Section titled “Built-in Functions”# String functionsawk '{print toupper($1)}' file.txtawk '{print tolower($1)}' file.txtawk '{print length($0)}' file.txtawk '{print substr($1, 2, 5)}' file.txtawk '{print index($1, "test")}' file.txtawk '{print gsub(/old/, "new", $1)}' file.txtawk '{print split($1, arr, "-")}' file.txt
# Math functionsawk '{print sqrt($1)}' file.txtawk '{print int($1)}' file.txtawk '{print sin($1)}' file.txtawk '{print rand()}'User-Defined Functions
Section titled “User-Defined Functions”awk 'function max(a, b) { return (a > b) ? a : b}{print max($1, $2)}' file.txtPractical DevOps Examples
Section titled “Practical DevOps Examples”1. Log Analysis
Section titled “1. Log Analysis”# Count HTTP status codesawk '{print $9}' access.log | sort | uniq -c | sort -rn
# Find 5xx errorsawk '$9 ~ /^[56][0-9][0-9]/ {print}' access.log
# Extract top IPsawk '{print $1}' access.log | sort | uniq -c | sort -rn | head -10
# Response time analysisawk '{sum+=$NF; count++} END {print "Average:", sum/count}' access.log2. Parse System Logs
Section titled “2. Parse System Logs”# Extract failed login attemptsawk '/Failed password/ {print $1, $2, $3, $11}' /var/log/auth.log
# Count errors by hourawk '/ERROR/ {print $2}' app.log | cut -d: -f1 | sort | uniq -c
# System resource alertsawk '$5 > 90 {print "ALERT:", $0}' system.log3. CSV Processing
Section titled “3. CSV Processing”# Print specific columnsawk -F',' '{print $1, $3, $5}' data.csv
# Skip headerawk -F',' 'NR>1 {print}' data.csv
# Sum columnawk -F',' 'NR>1 {sum+=$4} END {print sum}' data.csv
# Filter by column valueawk -F',' '$3 > 1000 {print}' data.csv4. Kubernetes Logs
Section titled “4. Kubernetes Logs”# Extract JSON fieldskubectl logs app | awk '{for(i=1;i<=NF;i++) if($i~/^level=/) print $i}'
# Parse timestampkubectl logs app | awk '{print $1, $2}' | sort | uniq -c
# Error aggregationkubectl logs app | awk '/ERROR/ {errors[$NF]++} END {for (e in errors) print e, errors[e]}'5. Docker Stats
Section titled “5. Docker Stats”# Parse docker stats outputdocker stats --no-stream | awk '{print $2, $3, $4}'
# Container resource usagedocker stats --format "{{.Name}} {{.CPUPerc}} {{.MemUsage}}" | \ awk '{if ($2 ~ /[0-9]+\.[0-9]+%/) print}'6. AWS CLI Output
Section titled “6. AWS CLI Output”# Parse EC2 instancesaws ec2 describe-instances | \ jq -r '.Reservations[].Instances[] | "\(.InstanceId) \(.State.Name) \(.PublicIpAddress)"'
# Using awk after jqaws ec2 describe-instances | \ jq -r '.Reservations[].Instances[] | "\(.Tags[] | select(.Key=="Name").Value) \(.InstanceType)"' | \ awk '{print $1, $2}'7. Network Analysis
Section titled “7. Network Analysis”# Parse netstatnetstat -ant | awk '{print $6}' | sort | uniq -c
# Parse ss outputss -tunap | awk 'NR>1 {print $1, $5, $6}'
# Firewall logsiptables -L -v -n | awk 'NR>2 {print $2, $3, $9}'8. Disk Usage Analysis
Section titled “8. Disk Usage Analysis”# Find large filesdu -sh /* 2>/dev/null | awk '{print $1, $2}' | sort -h
# Disk usage by directorydu -h /var | awk '{print $1, $2}' | sort -h
# Percentage usagedf -h | awk 'NR>1 {print $1, $5}' | awk '{gsub(/%/,""); print $1, $2}'BEGIN and END Blocks
Section titled “BEGIN and END Blocks”Using BEGIN and END
Section titled “Using BEGIN and END”# BEGIN - runs once before processingawk 'BEGIN {print "Starting..."} {print} END {print "Done"}' file.txt
# Initialize variablesawk 'BEGIN {sum=0; count=0} {sum+=$1; count++} END {print "Average:", sum/count}' file.txt
# Custom headersawk 'BEGIN {print "Name\tScore\tGrade"} {print} END {print "---"}' file.txtMulti-line Processing
Section titled “Multi-line Processing”# Group dataawk 'BEGIN {print "=== Report ==="} /ERROR/ {errors++} /WARN/ {warnings++} END {print "Errors:", errors, "Warnings:", warnings}' logfileComplex Examples
Section titled “Complex Examples”Report Generation
Section titled “Report Generation”#!/usr/bin/env bash# Generate system report
echo "=== System Report ==="echo "Date: $(date)"echo
echo "Top Processes by Memory:"ps aux --sort=-%mem | awk 'NR<=6 {printf "%-30s %10s %10s\n", $11, $6"MB", $3"%"}'
echoecho "Disk Usage:"df -h | awk 'NR>1 {printf "%-20s %10s %10s\n", $1, $3, $5}'
echoecho "Network Connections:"netstat -ant | awk '{print $6}' | sort | uniq -c | sort -rnLog Aggregation
Section titled “Log Aggregation”#!/usr/bin/env bash# Aggregate application logs
LOG_DIR="/var/log/myapp"OUTPUT="/tmp/log_summary.txt"
echo "Log Summary - $(date)" > "$OUTPUT"echo "===================" >> "$OUTPUT"
# Count by levelecho "" >> "$OUTPUT"echo "Messages by Level:" >> "$OUTPUT"grep -h "level=" "$LOG_DIR"/*.log 2>/dev/null | \ awk -F'level=' '{print $2}' | cut -d, -f1 | sort | uniq -c | sort -rn >> "$OUTPUT"
# Top errorsecho "" >> "$OUTPUT"echo "Top 10 Errors:" >> "$OUTPUT"grep -h "ERROR" "$LOG_DIR"/*.log 2>/dev/null | \ awk '{print $NF}' | sort | uniq -c | sort -rn | head -10 >> "$OUTPUT"
cat "$OUTPUT"Performance Tips
Section titled “Performance Tips”Efficiency
Section titled “Efficiency”# Use specific patterns (reduces processing)awk '/ERROR/ {print}' largefile.log
# Process only needed fieldsawk '{print $1, $2}' largefile.log
# Use while loops carefullyawk '{while()}' largefile.log # Can be slowBest Practices
Section titled “Best Practices”# Set FS in BEGIN (faster than -F)awk 'BEGIN {FS=","} {print $1}' file.csv
# Close files when doneawk '{print > "output.txt"}' file.txt
# Use next to skip recordsawk 'NR==1 {next} {print}' file.txtSummary
Section titled “Summary”In this chapter, you learned:
- ✅ How awk processes text (fields and records)
- ✅ Field variables ($0, $1, $NF, etc.)
- ✅ Field separators (FS, OFS)
- ✅ Patterns and pattern matching
- ✅ Built-in variables (NR, NF, etc.)
- ✅ Actions and expressions
- ✅ Control flow (if-else, loops)
- ✅ Arrays and associative arrays
- ✅ Built-in and user-defined functions
- ✅ BEGIN and END blocks
- ✅ Practical DevOps examples
Next Steps
Section titled “Next Steps”Continue to the next chapter to learn about Process Management in Bash.
Previous Chapter: sed - Stream Editor Next Chapter: Process Management