Skip to content

Monitoring_tools


CPU Performance Metrics
+------------------------------------------------------------------+
| |
| Key CPU Metrics: |
| |
| +----------------------------------------------------------+ |
| | %usr | User space CPU usage | |
| | %nice | User space nice CPU usage | |
| | %sys | Kernel space CPU usage | |
| | %iowait | Waiting for I/O | |
| | %irq | Interrupt handling | |
| | %softirq| Soft interrupt handling | |
| | %steal | Virtual CPU waiting (VM) | |
| | %guest | Running guest VM | |
| | %idle | Idle time | |
| +----------------------------------------------------------+ |
| |
| Load Average: |
| +----------------------------------------------------------+ |
| | 1 min | 5 min | 15 min average runnable processes | |
| | CPU count matters: load of 4 on 4-core = 100% utilized | |
| +----------------------------------------------------------+ |
| |
| CPU States: |
| +----------------------------------------------------------+ |
| | Running | Runnable | Blocked (I/O, sleep) | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+
Terminal window
# Basic usage
top # Interactive mode
top -d 1 # Update every 1 second
top -n 5 # Run 5 iterations
top -u username # Show specific user
top -p 1234 # Monitor specific PID
# Interactive commands
# P - Sort by CPU (default)
# M - Sort by memory
# T - Sort by time (cumulative)
# k - Kill process
# r - Renice process
# f - Field manager
# o - Change sort field
# 1 - Show per-CPU breakdown
# l - Toggle load average
# t - Toggle CPU info
# m - Toggle memory info
# W - Save configuration
# q - Quit
# Field display
top -o %MEM # Sort by memory
top -b -n 1 > output.txt # Batch mode for scripting
top -b -n 1 -p 1234 # Single PID in batch mode
Terminal window
# Installation
sudo apt install htop
sudo yum install htop
sudo pacman -S htop
# Usage
htop # Interactive
htop -u username # Filter by user
htop -p 1234,5678 # Monitor specific PIDs
htop -d 10 # Delay between updates
# Interactive keys
# F1 - Help
# F2 - Setup (columns, colors)
# F3 - Search process
# F4 - Filter processes
# F5 - Tree view
# F6 - Sort by (column)
# F7 - Nice - (increase priority)
# F8 - Nice + (decrease priority)
# F9 - Kill (send signal)
# Space - Tag process
# U - Untag all
# Ctrl+L - Refresh
# Ctrl+C - Cancel
# Tree view shows parent-child relationships
# Color coding by CPU/memory usage
# Function keys for quick actions
Terminal window
# Install (sysstat package)
sudo apt install sysstat
# All CPUs, 1 second interval, 5 samples
mpstat -P ALL 1 5
# Specific CPU
mpstat -P 0 1 5 # CPU 0 only
# Interval and count
mpstat 1 # Continuous, 1 second
# Key output columns:
# %usr - User time
# %nice - Nice time
# %sys - System time
# %iowait - I/O wait
# %irq - IRQ time
# %soft - Soft IRQ
# %steal - Steal time
# %guest - Guest time
# %idle - Idle time
Terminal window
# Install
sudo apt install sysstat
# CPU usage for last 24 hours (from log)
sar -u # Default for today
sar -u 1 5 # Real-time 5 samples
sar -u -s 10:00:00 -e 12:00:00 # Time range
# Historical CPU data
# Data stored in /var/log/sa/ or /var/log/sysstat/
# Other sar options
sar -q # Load average
sar -r # Memory usage
sar -S # Swap space
sar -d # Block device I/O
sar -n DEV # Network devices
sar -n SOCK # Sockets
sar -b # I/O and transfer rate
sar -B # Paging statistics
sar -W # Swapping statistics
# Create reports
sar -u -s 08:00 -e 18:00 -i /var/log/sa/sa20 > report.txt
Terminal window
# Basic usage
vmstat 1 5 # 1 second, 5 samples
vmstat 1 # Continuous
# Key columns:
# r - Runnable processes
# b - Blocked processes (I/O)
# si - Swap in
# so - Swap out
# bi - Blocks in (from disk)
# bo - Blocks out (to disk)
# us - User CPU
# sy - System CPU
# id - Idle CPU
# wa - Wait CPU
# st - Stolen CPU (VM)
# Memory columns:
# swpd - Virtual memory used
# free - Free memory
# buff - Buffer memory
# cache - Cache memory
# Detailed mode
vmstat -s # Detailed statistics
vmstat -m # Slab info
vmstat -d # Disk statistics

Linux Memory Types
+------------------------------------------------------------------+
| |
| Memory Categories: |
| |
| +----------------------------------------------------------+ |
| | MemTotal | Total physical RAM | |
| | MemFree | Completely unused RAM | |
| | MemAvailable| Free + buffer/cache (what's actually available) | |
| | Buffers | Block device cache | |
| | Cached | Page cache (file-backed) | |
| | SwapCached | Swapped pages still in memory | |
| | SwapTotal | Total swap space | |
| | SwapFree | Free swap space | |
| +----------------------------------------------------------+ |
| |
| Important: Linux uses available memory for buffers/cache |
| This memory can be reclaimed when applications need it |
| |
| Available Memory = Free + Buffers + Cached (approximate) |
| |
+------------------------------------------------------------------+
Terminal window
# Basic usage
free # Human-readable
free -b # Bytes
free -k # Kilobytes (default)
free -m # Megabytes
free -g # Gigabytes
free -h # Human-readable (best)
# Continuous monitoring
free -s 5 # Update every 5 seconds
watch -n 1 free # Watch continuously
# Extended view
free -l # Show low/high memory
free -t # Show totals
free -s 5 -c 10 # 10 iterations, 5 second intervals
# Understanding output:
# total - Total physical RAM
# used - Used (total - free - buffers/cache)
# free - Unused RAM
# shared - tmpfs (shmem)
# buffers - Block device buffers
# available - What apps can use (kernel 3.14+)
# buff/cache - Buffers + Cache combined
Terminal window
# View detailed memory info
cat /proc/meminfo
# Key fields:
# MemTotal: 16384032 kB
# MemFree: 2341232 kB
# MemAvailable: 12567890 kB # Important!
# Buffers: 234567 kB
# Cached: 4567890 kB
# SwapCached: 1234 kB
# Active(anon): 3456789 kB
# Inactive(anon): 1234567 kB
# Active(file): 5678901 kB
# Inactive(file): 2345678 kB
# SReclaimable: 567890 kB # Can be reclaimed
# SwapTotal: 8388604 kB
# SwapFree: 7654321 kB
# Filter specific field
grep -E "MemTotal|MemFree|MemAvailable" /proc/meminfo
Terminal window
# ps with memory sorting
ps aux --sort=-%mem | head -10 # Top memory users
ps -eo pid,user,%mem,vsz,comm --sort=-%mem | head
# pmap - Process memory map
pmap -x 1234 # Extended view
pmap -X 1234 # Even more detail
# smem - Memory reporting
sudo apt install smem
smem # All processes
smem -u # By user
smem -k # Show chart
smem -t # Totals
smem -p / # Per process
# slabtop - Kernel slab cache
sudo slabtop
# Shows kernel memory usage (dentry, inode caches)
# /dev/shm (Shared Memory)
df -h /dev/shm
ls -la /dev/shm

Terminal window
# Disk space overview
df -h # Human-readable
df -h -T # Include filesystem type
df -i # Show inodes
df -Bg # Blocks in GB
df -T /home # Specific mount point
# Find largest directories
du -sh /var/* # Directories in /var
du -h --max-depth=1 # One level deep
du -ah /home/user | sort -rh | head -10 # Top 10 largest files
# ncdu - Interactive disk usage
sudo apt install ncdu
ncdu / # Interactive
ncdu -q # Quiet (for scripting)
# Baobab (GUI disk analyzer)
sudo apt install baobab
baobab
Terminal window
# iostat - I/O statistics
iostat -x 1 5 # Extended, 1 sec, 5 times
iostat -x -k # In KB
iostat -m -x # In MB
iostat -p sda # Per partition
# Key columns:
# r/s - Read requests/second
# w/s - Write requests/second
# rKB/s - Read KB/second
# wKB/s - Write KB/second
# await - Average wait time (ms)
# svctm - Average service time (ms)
# %util - Device utilization %
# iotop - Per-process I/O
sudo iotop # Interactive
sudo iotop -o # Only active I/O
sudo iotop -P # Show processes
sudo iotop -b # Batch mode
sudo iotop -a # Accumulated I/O
# pidstat - Per-process I/O
pidstat -d 1 5 # Disk I/O
pidstat -r 1 5 # Memory
pidstat -w 1 5 # Context switches
Terminal window
# SMART status
sudo smartctl -a /dev/sda # All info
sudo smartctl -H /dev/sda # Health
sudo smartctl -t short /dev/sda # Short test
sudo smartctl -t long /dev/sda # Long test
sudo smartctl -l selftest /dev/sda # Results
# SMART attributes to watch:
# 5 - Reallocated Sector Count
# 187 - Reported Uncorrectable Errors
# 197 - Current Pending Sector Count
# 198 - Offline Uncorrectable
# 196 - Reallocation Event Count
# Disk partition info
lsblk # Tree view
lsblk -f # With filesystem
blkid # UUIDs and types
# I/O scheduler
cat /sys/block/sda/queue/scheduler
# Common: mq-deadline, bfq, none
# Set scheduler
echo mq-deadline | sudo tee /sys/block/sda/queue/scheduler

Terminal window
# Interface statistics
ip -s link
ip -s link show eth0
netstat -i
# Connection tracking
ss -tunapl # All connections
ss -tunapl | wc -l # Connection count
ss -tunapl state established
# Network statistics
sar -n DEV 1 5 # Device stats
sar -n SOCK 1 5 # Socket stats
sar -n TCP 1 5 # TCP stats
sar -n EDEV 1 5 # Error stats
# netstat (deprecated but useful)
netstat -tunapl
netstat -s # Per-protocol stats
netstat -r # Routing table
Terminal window
# iftop - Bandwidth by host
sudo iftop # Interactive
sudo iftop -i eth0 # Interface
sudo iftop -n # No DNS resolution
sudo iftop -B # Bytes not bits
# nethogs - Per-process bandwidth
sudo nethogs # Interactive
sudo nethogs eth0 # Interface
sudo nethogs -d 1 # 1 second delay
# iptraf-ng - Traffic monitor
sudo iptraf-ng
# vnstat - Network traffic logger
sudo apt install vnstat
vnstat # Summary
vnstat -h # Hourly
vnstat -d # Daily
vnstat -m # Monthly
vnstat -l # Live
# nload - Real-time traffic
sudo nload
sudo nload -n # No DNS
Terminal window
# tcpdump - Packet capture
sudo tcpdump -i eth0
sudo tcpdump -i eth0 port 80
sudo tcpdump -i eth0 host 192.168.1.1
sudo tcpdump -i eth0 -w capture.pcap
sudo tcpdump -r capture.pcap
# tshark - Wireshark CLI
sudo tshark -i eth0
sudo tshark -r capture.pcap
# ngrep - Network grep
sudo ngrep -i eth0 password

Terminal window
# Basic process listing
ps aux # BSD style
ps -ef # Standard style
ps -ejH # Process tree
ps -eo pid,ppid,user,%cpu,%mem,cmd
# Find processes
pgrep -a nginx # Find by name
pgrep -u username # By user
pidof nginx # Get PIDs
# Process tree
pstree # Tree view
pstree -p # With PIDs
pstree -u username # User's tree
# Process details
ps -fp 1234 # Full info for PID
ls -la /proc/1234/ # Process info in /proc
cat /proc/1234/status # Detailed status
Terminal window
# lsof - List open files
lsof # All files
lsof -p 1234 # Files for PID
lsof -u username # User's files
lsof -i # Network files
lsof -i :80 # Port 80
lsof -i TCP:80 # TCP port 80
lsof -i @192.168.1.1 # To specific host
lsof +D /var/log # Files in directory
# fuser - What process using file/port
fuser 80/tcp # What's using port 80
fuser -k 80/tcp # Kill processes using port
fuser /var/log/syslog # Using file
Terminal window
# strace - System calls
strace -p 1234 # Attach to process
strace -c -p 1234 # Count syscalls
strace -e openat -p 1234 # Filter syscalls
strace -f -p 1234 # Follow forks
strace -o output.txt -p 1234 # To file
strace -tt -p 1234 # Timestamps
# ltrace - Library calls
ltrace -p 1234
ltrace -c -p 1234 # Count calls
# /proc filesystem
cat /proc/1234/cmdline # Command line
cat /proc/1234/environ # Environment
ls /proc/1234/fd/ # File descriptors
cat /proc/1234/maps # Memory maps
cat /proc/1234/statm # Memory stats

Terminal window
# View current limits
ulimit -a # All limits
ulimit -n # Open files
ulimit -u # Max processes
# Set limits (current session)
ulimit -n 65535 # Max open files
ulimit -u 4096 # Max processes
ulimit -f 1024 # Max file size (KB)
# Persistent limits
# /etc/security/limits.conf
# Examples:
# * soft nofile 65535
# * hard nofile 65535
# root soft nofile 65535
# root hard nofile 65535
# @student soft nproc 100
# @student hard nproc 200
# Also /etc/security/limits.d/*.conf
Terminal window
# systemd resource limits
# See service unit file:
# MemoryMax=1G
# CPUQuota=50%
# TasksMax=100
# cgroups v2
ls /sys/fs/cgroup/
# Create cgroup
mkdir -p /sys/fs/cgroup/mygroup
echo 1000000 > /sys/fs/cgroup/mygroup/cpu.max
echo 500000000 > /sys/fs/cgroup/mygroup/memory.max
echo $$ > /sys/fs/cgroup/mygroup/cgroup.procs
# View cgroup of process
cat /proc/1234/cgroup

Terminal window
# Kernel and OS
uname -a # All info
uname -r # Kernel version
uname -m # Architecture
cat /etc/os-release # OS details
# Hardware
lspci # PCI devices
lsusb # USB devices
lscpu # CPU info
lsscsi # SCSI devices
lsmem # Memory devices
# Kernel modules
lsmod # Loaded modules
modinfo module_name # Module info
sudo modprobe module_name # Load module
sudo modprobe -r module_name # Unload
# Device info
lsblk # Block devices
cat /proc/cpuinfo # CPU details
cat /proc/meminfo # Memory details

system_health.sh
#!/bin/bash
echo "=== System Health Check ==="
echo ""
echo "--- Uptime ---"
uptime
echo ""
echo "--- Load Average ---"
cat /proc/loadavg
echo ""
echo "--- Memory ---"
free -h
echo ""
echo "--- Disk Usage ---"
df -h | grep -v tmpfs
echo ""
echo "--- CPU Usage ---"
top -bn1 | grep "Cpu(s)" | awk '{print "CPU: " $2}'
echo ""
echo "--- Top Memory Users ---"
ps aux --sort=-%mem | head -6 | awk '{print $6/1024 " MB - " $11}'
echo ""
echo "--- Top CPU Users ---"
ps aux --sort=-%cpu | head -6 | awk '{print $3 "% - " $11}'
echo ""
echo "--- Network Connections ---"
ss -tunapl | wc -l
echo "connections"
echo ""
echo "--- Failed Login Attempts ---"
lastb | head -5

  1. How do you check CPU usage in Linux?

    • top, htop, mpstat, sar
  2. What is load average?

    • Average number of runnable processes over 1/5/15 minutes
  3. How do you check memory usage?

    • free -h, /proc/meminfo, top
  4. What is the difference between buffers and cache?

    • Buffers: block device I/O; Cache: page cache for files
  5. How do you find which process is using most memory?

    • ps aux —sort=-%mem, htop
  1. What is iowait and what causes it?

    • CPU waiting for I/O; caused by slow disk or high I/O load
  2. How do you monitor disk I/O per process?

    • iotop, pidstat -d
  3. What is the difference between iostat %util and svctm?

    • %util: device utilization; svctm: service time (deprecated)
  4. How do you trace system calls of a process?

    • strace -p PID
  5. What is the difference between ss and netstat?

    • ss is newer, faster; netstat is deprecated
  1. Explain Linux memory management

    • Page cache, buffers, swap, available memory concept
  2. What is the relationship between load average and CPU usage?

    • Load includes running + waiting processes; CPU% shows actual execution
  3. How do you diagnose high iowait?

    • Check iostat, iotop, identify I/O-heavy processes
  4. What are some methods to reduce system load?

    • Optimize processes, reduce I/O, add hardware, tune kernel params
  5. Explain the different CPU states in mpstat

    • usr, nice, sys, iowait, irq, softirq, steal, guest, idle

Quick Reference
+------------------------------------------------------------------+
| |
| CPU Monitoring: |
| +----------------------------------------------------------+ |
| | top, htop | Real-time process monitor | |
| | mpstat -P ALL | Per-CPU statistics | |
| | sar -u | Historical CPU data | |
| | vmstat | Virtual memory + CPU | |
| +----------------------------------------------------------+ |
| |
| Memory Monitoring: |
| +----------------------------------------------------------+ |
| | free -h | Memory usage summary | |
| | cat /proc/meminfo | Detailed memory info | |
| | smem | Per-process memory | |
| | slabtop | Kernel slab cache | |
| +----------------------------------------------------------+ |
| |
| Disk Monitoring: |
| +----------------------------------------------------------+ |
| | df -h | Disk space | |
| | du -sh | Directory sizes | |
| | iostat -x | I/O statistics | |
| | iotop | Per-process I/O | |
| | smartctl | Disk health | |
| +----------------------------------------------------------+ |
| |
| Network Monitoring: |
| +----------------------------------------------------------+ |
| | ss -tunapl | Socket statistics | |
| | iftop | Bandwidth by host | |
| | nethogs | Per-process bandwidth | |
| | tcpdump | Packet capture | |
| +----------------------------------------------------------+ |
| |
| Process Monitoring: |
| +----------------------------------------------------------+ |
| | ps aux | Process listing | |
| | lsof | Open files | |
| | strace | System call tracing | |
| | pstree | Process tree | |
| +----------------------------------------------------------+ |
| |
+------------------------------------------------------------------+