Profiling_tools
Profiling Tools and Techniques
Section titled “Profiling Tools and Techniques”Profiling is the process of measuring program performance to identify bottlenecks and optimization opportunities. This chapter covers essential profiling tools for C++ applications.
Introduction to Profiling
Section titled “Introduction to Profiling”Profiling helps you understand:
- Hot spots: Where your program spends most of its time
- Memory usage: How much memory is being used and where
- Call patterns: Function call frequencies and relationships
- Cache performance: Cache hit/miss rates
Why Profile?
Section titled “Why Profile?”// DON'T guess - profile!// Common mistakes:// 1. Optimizing without measurement// 2. Focusing on wrong bottlenecks// 3. Making code less readable without benefit
// Example: Which is faster?int sum1 = 0;for (int i = 0; i < n; i++) { sum1 += i * i;}
// vsint sum2 = 0;for (int i = 0; i < n; i++) { for (int j = 0; j < i; j++) { sum2 += j; }}
// Profile will tell you!GPROF - GNU Profiler
Section titled “GPROF - GNU Profiler”The classic Unix profiling tool included with GCC.
Basic Usage
Section titled “Basic Usage”# 1. Compile with profiling flagg++ -pg -g -O2 -o myprogram myprogram.cpp utils.cpp
# 2. Run your program (creates gmon.out)./myprogram
# 3. Analyze the profilegprof myprogram gmon.out > profile.txtUnderstanding Output
Section titled “Understanding Output”% cumulative self self total time seconds seconds calls ms/call ms/call name 25.00 0.50 0.25 main [1] 20.00 0.90 0.20 100 2.00 5.00 process_data() [2] 15.00 1.20 0.15 50 3.00 3.00 calculate() [3] 10.00 1.40 0.10 1000 0.10 0.10 helper() [4]Key Fields Explained
Section titled “Key Fields Explained”| Field | Meaning |
|---|---|
| % time | Percentage of total execution time |
| cumulative seconds | Total time including called functions |
| self seconds | Time spent in this function only |
| calls | Number of times function was called |
| ms/call | Average milliseconds per call |
Limitations
Section titled “Limitations”// GPROF limitations:// 1. Doesn't profile functions called less than certain threshold// 2. Can't profile inline functions accurately// 3. Poor support for multi-threaded programs// 4. Only works with dynamic linking in some cases
// Better alternatives: - Valgrind (Callgrind) - perf (Linux) - Intel VTune - Google Performance ToolsValgrind
Section titled “Valgrind”Comprehensive memory debugging and profiling framework.
Installation
Section titled “Installation”# Ubuntu/Debiansudo apt-get install valgrind
# macOSbrew install valgrind
# Fedora/RHELsudo dnf install valgrindMemcheck - Memory Error Detection
Section titled “Memcheck - Memory Error Detection”# Basic memory leak checkvalgrind --leak-check=full ./myprogram
# Show detailed leak informationvalgrind --leak-check=full --show-leak-kinds=all ./myprogram
# Track origin of uninitialized valuesvalgrind --track-origins=yes ./myprogramCommon Valgrind Errors
Section titled “Common Valgrind Errors”// 1. Invalid writeint* p = new int[10];p[10] = 5; // Out of bounds!
// 2. Use after freeint* p = new int;delete p;*p = 5; // Use after free!
// 3. Memory leakvoid leak() { int* p = new int[100]; // Missing delete!}
// 4. Mismatched new/deleteint* p = new int[10];free(p); // Wrong! Use delete[]Callgrind - Performance Profiling
Section titled “Callgrind - Performance Profiling”# Generate call graphvalgrind --tool=callgrind ./myprogram
# This creates callgrind.out.1234# View with kcachegrind:kcachegrind callgrind.out.1234More Options
Section titled “More Options”# Cache simulationvalgrind --tool=cachegrind ./myprogram
# Heap profilervalgrind --tool=heap ./myprogram
# Thread debuggervalgrind --tool=helgrind ./myprogramPerf - Linux Performance Counters
Section titled “Perf - Linux Performance Counters”Powerful profiler using CPU hardware counters.
Installation
Section titled “Installation”# Ubuntu/Debiansudo apt-get install linux-tools-common linux-tools-generic
# Verifyperf --versionBasic Commands
Section titled “Basic Commands”# Record CPU profileperf record -g ./myprogram
# View results (interactive)perf report
# Live monitoringperf top
# Show with source annotationperf annotateCommon Events
Section titled “Common Events”# CPU cycles (default)perf record ./myprogram
# Cache eventsperf record -e cache-references -e cache-misses ./myprogram
# Branch predictionperf record -e branch-instructions -e branch-misses ./myprogram
# Context switchesperf record -e context-switches ./myprogramPerf Stat - Summary Statistics
Section titled “Perf Stat - Summary Statistics”# Overall statisticsperf stat ./myprogram
# Run multiple timesperf stat -r 5 ./myprogram
# Detailed outputperf stat -d ./myprogramExample output:
Performance counter stats for './myprogram':
1.234567 seconds time elapsed
500,000 cycles 100,000 instructions 0.20 instructions per cycle 50,000 cache references 5,000 cache misses 10.00% cache miss ratePerf with Flame Graphs
Section titled “Perf with Flame Graphs”# Record with stack tracesperf record -g -F 99 ./myprogram
# Generate flame graph (requires flamegraph package)perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svgGoogle Performance Tools
Section titled “Google Performance Tools”Comprehensive profiling suite from Google.
Installation
Section titled “Installation”# Ubuntu/Debiansudo apt-get install google-perftools libgoogle-perftools-dev
# From sourcegit clone https://github.com/gperftools/gperftoolscd gperftools./configure && make && sudo make installCPU Profiling
Section titled “CPU Profiling”#include <gperftools/profiler.h>
int main() { ProfilerStart("cpu.prof");
// Your code here for (int i = 0; i < 1000; i++) { process_data(i); }
ProfilerStop(); return 0;}# Compileg++ -o myprogram myprogram.cpp -lprofiler
# Run - creates cpu.prof./myprogram
# Analyzegoogle-pprof --text ./myprogram cpu.profgoogle-pprof --pdf ./myprogram cpu.prof > profile.pdfHeap Profiling
Section titled “Heap Profiling”#include <gperftools/heap-profiler.h>
int main() { HeapProfilerStart("heap.prof");
// Code that allocates memory std::vector<int> v; for (int i = 0; i < 10000; i++) { v.push_back(i); }
HeapProfilerDump("snapshot1"); v.clear(); HeapProfilerDump("snapshot2");
HeapProfilerStop(); return 0;}Custom Timing with std::chrono
Section titled “Custom Timing with std::chrono”For quick performance measurements in code.
Basic Timer Class
Section titled “Basic Timer Class”#include <chrono>#include <iostream>
class Timer {public: void start() { start_time = std::chrono::high_resolution_clock::now(); }
void stop() { end_time = std::chrono::high_resolution_clock::now(); }
double elapsed_ms() const { auto duration = std::chrono::duration_cast<std::chrono::microseconds>( end_time - start_time ); return duration.count() / 1000.0; }
private: std::chrono::high_resolution_clock::time_point start_time; std::chrono::high_resolution_clock::time_point end_time;};
// Usageint main() { Timer timer; timer.start();
// Code to measure intensive_computation();
timer.stop(); std::cout << "Time: " << timer.elapsed_ms() << " ms" << std::endl;
return 0;}Benchmark Macro
Section titled “Benchmark Macro”#include <chrono>#include <iostream>
#define BENCHMARK(code) \ do { \ auto start = std::chrono::high_resolution_clock::now(); \ code; \ auto end = std::chrono::high_resolution_clock::now(); \ auto us = std::chrono::duration_cast<std::chrono::microseconds>(end - start); \ std::cout << #code << ": " << us.count() << " μs" << std::endl; \ } while(0)
int main() { std::vector<int> vec(1000000, 1);
BENCHMARK(std::sort(vec.begin(), vec.end())); BENCHMARK(std::stable_sort(vec.begin(), vec.end()));
return 0;}RAII Timer
Section titled “RAII Timer”#include <chrono>#include <iostream>#include <iomanip>
class ScopedTimer {public: ScopedTimer(const char* name) : name_(name) { start_ = std::chrono::high_resolution_clock::now(); }
~ScopedTimer() { auto end = std::chrono::high_resolution_clock::now(); auto ns = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start_); std::cout << name_ << ": " << ns.count() << " ns" << std::endl; }
private: const char* name_; std::chrono::high_resolution_clock::time_point start_;};
// Usage - automatic timing and cleanupvoid process_data() { ScopedTimer timer("process_data");
// Function body automatically timed for (int i = 0; i < 1000000; i++) { // do work }}Intel VTune Profiler
Section titled “Intel VTune Profiler”Commercial-grade profiler with excellent visualization.
Basic Usage
Section titled “Basic Usage”# Install (Ubuntu)sudo apt-get install intel-oneapi-vtune
# Hotspots analysisvtune -collect hotspots ./myprogram
# Memory access analysisvtune -collect memory-access ./myprogram
# Threading analysisvtune -collect threading ./myprogram
# Generate reportvtune -report summaryGUI Mode
Section titled “GUI Mode”# Launch VTune GUIvtune-guiThen:
- Click “New Project”
- Configure target application
- Choose analysis type (Hotspots, Memory Access, etc.)
- Click “Start”
Visual Studio Profiler (Windows)
Section titled “Visual Studio Profiler (Windows)”Using Visual Studio
Section titled “Using Visual Studio”- Debug → Performance Profiler (or Alt+F2)
- Select profiling type:
- CPU Usage: Find hot functions
- Memory Usage: Detect memory issues
- GPU Usage: For DirectX applications
- Click “Start”
- Interact with your application
- View results
Command Line
Section titled “Command Line”vsprof.exe --start --session:MySession --out:results.vspmyprogram.exevsprof.exe --stop --session:MySessionChoosing the Right Tool
Section titled “Choosing the Right Tool”| Tool | Best For | Platform | Cost |
|---|---|---|---|
| gprof | Simple profiling | Unix | Free |
| Valgrind | Memory debugging | Cross-platform | Free |
| perf | Hardware counters | Linux | Free |
| gperftools | CPU/Heap profiling | Linux/FreeBSD | Free |
| VTune | Advanced analysis | Linux/Windows | Commercial (free for personal) |
| Visual Studio | Windows development | Windows | Commercial |
Best Practices
Section titled “Best Practices”-
Profile in Release Mode: Use
-O2or-O3flags for realistic results -
Use Real Data: Profile with realistic input data that matches production
-
Multiple Runs: Run several times to account for variance
-
Focus on Hotspots: Don’t optimize code that doesn’t matter
-
Measure Before/After: Always measure before and after optimization
-
Don’t Guess: Use profiling data, not intuition
-
Consider All Factors: CPU, memory, cache, I/O
Profiling Workflow
Section titled “Profiling Workflow”// Step 1: Identify the problem// "My program is slow"
// Step 2: Profile to find bottleneck// Use perf/gprof to find hot functions
// Step 3: Analyze the code// Understand why it's slow
// Step 4: Optimize// Apply optimization technique
// Step 5: Verify// Profile again to confirm improvement
// Step 6: Repeat// Find next bottleneckKey Takeaways
Section titled “Key Takeaways”- Profiling identifies performance bottlenecks objectively
- Use multiple tools for comprehensive analysis
- Profile with realistic workloads in release mode
- Measure before and after optimizations
- Focus on the biggest gains first
- Don’t optimize code that doesn’t matter