Skip to content

Latest commit

 

History

History

memory-hierarchy

loop-order

To compile: cc -g loop-order.c.

To generate cachegrind.out.<pid>:

$ valgrind --tool=cachegrind --cache-sim=yes ./a.out
=21339== Cachegrind, a cache and branch-prediction profiler
==21339== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==21339== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==21339== Command: ./a.out
==21339== 
--21339-- warning: L3 cache found, using its data for the LL simulation.
==21339== 
==21339== I refs:        480,189,794
==21339== I1  misses:          1,066
==21339== LLi misses:          1,058
==21339== I1  miss rate:        0.00%
==21339== LLi miss rate:        0.00%
==21339== 
==21339== D refs:        224,073,258  (192,054,705 rd   + 32,018,553 wr)
==21339== D1  misses:     17,001,492  (      1,167 rd   + 17,000,325 wr)
==21339== LLd misses:      2,001,358  (      1,041 rd   +  2,000,317 wr)
==21339== D1  miss rate:         7.6% (        0.0%     +       53.1%  )
==21339== LLd miss rate:         0.9% (        0.0%     +        6.2%  )
==21339== 
==21339== LL refs:        17,002,558  (      2,233 rd   + 17,000,325 wr)
==21339== LL misses:       2,002,416  (      2,099 rd   +  2,000,317 wr)
==21339== LL miss rate:          0.3% (        0.0%     +        6.2%  )

Then you can analyze it with:

$ cg_annotate cachegrind.out.<pid>
--------------------------------------------------------------------------------
-- Metadata
--------------------------------------------------------------------------------
Invocation:       /usr/bin/cg_annotate cachegrind.out.21339
I1 cache:         32768 B, 64 B, 8-way associative
D1 cache:         49152 B, 64 B, 12-way associative
LL cache:         25165824 B, 64 B, 12-way associative
Command:          ./a.out
Events recorded:  Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Events shown:     Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Event sort order: Ir I1mr ILmr Dr D1mr DLmr Dw D1mw DLmw
Threshold:        0.1%
Annotation:       on

--------------------------------------------------------------------------------
-- Summary
--------------------------------------------------------------------------------
Ir__________________ I1mr__________ ILmr__________ Dr__________________ D1mr__________ DLmr__________ Dw_________________ D1mw_______________ DLmw______________ 

480,189,794 (100.0%) 1,066 (100.0%) 1,058 (100.0%) 192,054,705 (100.0%) 1,167 (100.0%) 1,041 (100.0%) 32,018,553 (100.0%) 17,000,325 (100.0%) 2,000,317 (100.0%)  PROGRAM TOTALS

--------------------------------------------------------------------------------
-- File:function summary
--------------------------------------------------------------------------------
  Ir__________________________ I1mr__________ ILmr__________ Dr__________________________ D1mr__________ DLmr__________ Dw_________________________ D1mw_______________________ DLmw______________________  file:function

< 480,056,029 (100.0%, 100.0%) 3 (0.3%, 0.3%) 3 (0.3%, 0.3%) 192,024,008 (100.0%, 100.0%) 2 (0.2%, 0.2%) 2 (0.2%, 0.2%) 32,008,007 (100.0%, 100.0%) 17,000,000 (100.0%, 100.0%) 2,000,000 (100.0%, 100.0%)  /home/rei/csi/intro-systems/memory-hierarchy/loop-order.c:
  240,028,010  (50.0%)         1 (0.1%)       1 (0.1%)        96,012,003  (50.0%)         1 (0.1%)       1 (0.1%)       16,004,002  (50.0%)         16,000,000  (94.1%)         1,000,000  (50.0%)            option_two
  240,028,010  (50.0%)         1 (0.1%)       1 (0.1%)        96,012,003  (50.0%)         1 (0.1%)       1 (0.1%)       16,004,002  (50.0%)          1,000,000   (5.9%)         1,000,000  (50.0%)            option_one

--------------------------------------------------------------------------------
-- Function:file summary
--------------------------------------------------------------------------------
  Ir_________________________ I1mr__________ ILmr__________ Dr________________________ D1mr__________ DLmr__________ Dw________________________ D1mw______________________ DLmw_____________________  function:file

> 240,028,010 (50.0%,  50.0%) 1 (0.1%, 0.1%) 1 (0.1%, 0.1%) 96,012,003 (50.0%,  50.0%) 1 (0.1%, 0.1%) 1 (0.1%, 0.1%) 16,004,002 (50.0%,  50.0%) 16,000,000 (94.1%,  94.1%) 1,000,000 (50.0%,  50.0%)  option_two:/home/rei/csi/intro-systems/memory-hierarchy/loop-order.c

> 240,028,010 (50.0%, 100.0%) 1 (0.1%, 0.2%) 1 (0.1%, 0.2%) 96,012,003 (50.0%, 100.0%) 1 (0.1%, 0.2%) 1 (0.1%, 0.2%) 16,004,002 (50.0%, 100.0%)  1,000,000  (5.9%, 100.0%) 1,000,000 (50.0%, 100.0%)  option_one:/home/rei/csi/intro-systems/memory-hierarchy/loop-order.c

--------------------------------------------------------------------------------
-- Annotated source file: /home/rei/csi/intro-systems/memory-hierarchy/loop-order.c
--------------------------------------------------------------------------------
Ir_________________ I1mr____ ILmr____ Dr________________ D1mr____ DLmr____ Dw________________ D1mw______________ DLmw_____________ 

-- line 2 ----------------------------------------
          .         .        .                 .         .        .                 .                  .                 .          
          .         .        .                 .         .        .                 .                  .                 .          Two different ways to loop over an array of arrays.
          .         .        .                 .         .        .                 .                  .                 .          
          .         .        .                 .         .        .                 .                  .                 .          Spotted at:
          .         .        .                 .         .        .                 .                  .                 .          http://stackoverflow.com/questions/9936132/why-does-the-order-of-the-loops-affect-performance-when-iterating-over-a-2d-arra
          .         .        .                 .         .        .                 .                  .                 .          
          .         .        .                 .         .        .                 .                  .                 .          */
          .         .        .                 .         .        .                 .                  .                 .          
          2  (0.0%) 0        0                 0         0        0                 1  (0.0%)          0                 0          void option_one() {
          .         .        .                 .         .        .                 .                  .                 .            int i, j;
          .         .        .                 .         .        .                 .                  .                 .            static int x[4000][4000];
     12,004  (0.0%) 1 (0.1%) 1 (0.1%)      8,001  (0.0%) 0        0                 1  (0.0%)          0                 0            for (i = 0; i < 4000; i++) {
 48,016,000 (10.0%) 0        0        32,004,000 (16.7%) 0        0             4,000  (0.0%)          0                 0              for (j = 0; j < 4000; j++) {
192,000,000 (40.0%) 0        0        64,000,000 (33.3%) 0        0        16,000,000 (50.0%)  1,000,000  (5.9%) 1,000,000 (50.0%)        x[i][j] = i + j;
          .         .        .                 .         .        .                 .                  .                 .              }
          .         .        .                 .         .        .                 .                  .                 .            }
          4  (0.0%) 0        0                 2  (0.0%) 1 (0.1%) 1 (0.1%)          0                  0                 0          }
          .         .        .                 .         .        .                 .                  .                 .          
          2  (0.0%) 1 (0.1%) 1 (0.1%)          0         0        0                 1  (0.0%)          0                 0          void option_two() {
          .         .        .                 .         .        .                 .                  .                 .            int i, j;
          .         .        .                 .         .        .                 .                  .                 .            static int x[4000][4000];
     12,004  (0.0%) 0        0             8,001  (0.0%) 0        0                 1  (0.0%)          0                 0            for (i = 0; i < 4000; i++) {
 48,016,000 (10.0%) 0        0        32,004,000 (16.7%) 0        0             4,000  (0.0%)          0                 0              for (j = 0; j < 4000; j++) {
192,000,000 (40.0%) 0        0        64,000,000 (33.3%) 0        0        16,000,000 (50.0%) 16,000,000 (94.1%) 1,000,000 (50.0%)        x[j][i] = i + j;
          .         .        .                 .         .        .                 .                  .                 .              }
          .         .        .                 .         .        .                 .                  .                 .            }
          4  (0.0%) 0        0                 2  (0.0%) 1 (0.1%) 1 (0.1%)          0                  0                 0          }
          .         .        .                 .         .        .                 .                  .                 .          
          2  (0.0%) 1 (0.1%) 1 (0.1%)          0         0        0                 1  (0.0%)          0                 0          int main() {
          2  (0.0%) 0        0                 0         0        0                 1  (0.0%)          0                 0            option_one();
          2  (0.0%) 0        0                 0         0        0                 1  (0.0%)          0                 0            option_two();
          1  (0.0%) 0        0                 0         0        0                 0                  0                 0            return 0;
          2  (0.0%) 0        0                 2  (0.0%) 0        0                 0                  0                 0          }

--------------------------------------------------------------------------------
-- Annotation summary
--------------------------------------------------------------------------------
Ir__________________ I1mr_________ ILmr_________ Dr__________________ D1mr_________ DLmr_________ Dw_________________ D1mw_______________ DLmw______________ 

480,056,029 (100.0%)     3  (0.3%)     3  (0.3%) 192,024,008 (100.0%)     2  (0.2%)     2  (0.2%) 32,008,007 (100.0%) 17,000,000 (100.0%) 2,000,000 (100.0%)    annotated: files known & above threshold & readable, line numbers known
          0              0             0                   0              0             0                  0                   0                  0             annotated: files known & above threshold & readable, line numbers unknown
          0              0             0                   0              0             0                  0                   0                  0           unannotated: files known & above threshold & two or more non-identical
          0              0             0                   0              0             0                  0                   0                  0           unannotated: files known & above threshold & unreadable 
    133,654   (0.0%) 1,047 (98.2%) 1,040 (98.3%)      30,670   (0.0%) 1,161 (99.5%) 1,035 (99.4%)     10,533   (0.0%)        324   (0.0%)       316   (0.0%)  unannotated: files known & below threshold
        111   (0.0%)    16  (1.5%)    15  (1.4%)          27   (0.0%)     4  (0.3%)     4  (0.4%)         13   (0.0%)          1   (0.0%)         1   (0.0%)  unannotated: files unknown

matrix-multiply

Command to benchmark: cc -Wall matrix-multiply.c benchmark.c && ./a.out 1000.

Swapping two internal loops (j and k):

Naive: 3.575s
Fast: 2.300s
1.55x speedup