我试图使用perf和ocperf在我的代码中 Build 瓶颈 . 如果我在我的二进制文件上运行'detailed stat',则会以红色文本报告两个统计信息,我认为这意味着它太高了 .

L1-dcache-load-misss为红色,为28.60%

iTLB-load-miss为红色,为425.89%

# ~bram/src/pmu-tools/ocperf.py stat -d -d -d -d -d ./bench ray
perf stat -d -d -d -d -d ./bench ray
Loaded 455 primitives.
Testing ray against 455 primitives.

 Performance counter stats for './bench ray':

       9031.444612      task-clock (msec)         #    1.000 CPUs utilized          
                15      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
               292      page-faults               #    0.032 K/sec                  
    28,786,063,163      cycles                    #    3.187 GHz                      (61.47%)
   <not supported>      stalled-cycles-frontend  
   <not supported>      stalled-cycles-backend   
    55,742,952,563      instructions              #    1.94  insns per cycle          (69.18%)
     3,717,242,560      branches                  #  411.589 M/sec                    (69.18%)
        18,097,580      branch-misses             #    0.49% of all branches          (69.18%)
    10,230,376,136      L1-dcache-loads           # 1132.751 M/sec                    (69.17%)
     2,926,349,754      L1-dcache-load-misses     #   28.60% of all L1-dcache hits    (69.21%)
       145,843,523      LLC-loads                 #   16.148 M/sec                    (69.32%)
            49,512      LLC-load-misses           #    0.07% of all LL-cache hits     (69.33%)
   <not supported>      L1-icache-loads          
           260,144      L1-icache-load-misses     #    0.029 M/sec                    (69.34%)
    10,230,376,830      dTLB-loads                # 1132.751 M/sec                    (69.34%)
             1,197      dTLB-load-misses          #    0.00% of all dTLB cache hits   (61.59%)
             2,294      iTLB-loads                #    0.254 K/sec                    (61.55%)
             9,770      iTLB-load-misses          #  425.89% of all iTLB cache hits   (61.51%)
   <not supported>      L1-dcache-prefetches     
   <not supported>      L1-dcache-prefetch-misses

       9.032234014 seconds time elapsed

我的问题:

  • L1数据缓存未命中的合理数字是多少?

  • iTLB-load-miss的合理数字是多少?

  • 为什么iTLB负载未命中超过100%?换句话说:为什么iTLB负载未命中超过iTLB负载?我甚至看到它飙升高达568%

此外,我的机器有一个Haswell CPU . 我原本预计会包含陷入停滞的周期数据吗?