Memory_layout

Memory Layout and Cache Optimization

Understanding memory layout improves performance.

Memory Hierarchy

┌────────────────────────────────────────┐
│           Registers (fastest)          │
├────────────────────────────────────────┤
│           L1 Cache                     │
├────────────────────────────────────────┤
│           L2 Cache                     │
├────────────────────────────────────────┤
│           L3 Cache                     │
├────────────────────────────────────────┤
│           Main Memory (RAM)            │
├────────────────────────────────────────┤
│           Disk (slowest)               │
└────────────────────────────────────────┘

Cache Line

Typical cache line: 64 bytes
Accessing memory within same cache line is fast
Accessing different cache lines is slow

Data Layout Optimization

// Bad: Array of Structures (AoS)
struct Point { float x, y, z; };
std::vector<Point> points;

// Better: Structure of Arrays (SoA)
struct Points {
    std::vector<float> x, y, z;
};

Cache-Friendly Access

// Cache-friendly (row-major)
for (int i = 0; i < N; i++)
    for (int j = 0; j < M; j++)
        process(matrix[i][j]);

// Cache-unfriendly (column-major)
for (int j = 0; j < M; j++)
    for (int i = 0; i < N; i++)
        process(matrix[i][j]);

Prefetching

// Hint to compiler to prefetch
__builtin_prefetch(&data[next_index]);