1 min read
Summation Method Performance Analysis

A performance comparison of three summation methods — direct (register-only), vector (sequential array), and indirect (random array traversal) — run on CPU nodes at the Perlmutter supercomputer at NERSC. Direct and vector summation achieved ~1100–1200 MFLOP/s, while indirect summation reached only ~11–12 MFLOP/s, confirming that random memory access is roughly two orders of magnitude slower than serial access due to cache miss overhead.