Skip to content

Memory_layout

Understanding memory layout improves performance.

┌────────────────────────────────────────┐
│ Registers (fastest) │
├────────────────────────────────────────┤
│ L1 Cache │
├────────────────────────────────────────┤
│ L2 Cache │
├────────────────────────────────────────┤
│ L3 Cache │
├────────────────────────────────────────┤
│ Main Memory (RAM) │
├────────────────────────────────────────┤
│ Disk (slowest) │
└────────────────────────────────────────┘
  • Typical cache line: 64 bytes
  • Accessing memory within same cache line is fast
  • Accessing different cache lines is slow
// Bad: Array of Structures (AoS)
struct Point { float x, y, z; };
std::vector<Point> points;
// Better: Structure of Arrays (SoA)
struct Points {
std::vector<float> x, y, z;
};
// Cache-friendly (row-major)
for (int i = 0; i < N; i++)
for (int j = 0; j < M; j++)
process(matrix[i][j]);
// Cache-unfriendly (column-major)
for (int j = 0; j < M; j++)
for (int i = 0; i < N; i++)
process(matrix[i][j]);
// Hint to compiler to prefetch
__builtin_prefetch(&data[next_index]);