Daniel Lemire的Measuring the system clock frequency using loops (Intel and ARM)讲述了如何利用汇编指令来度量系统的时钟频率。以Intel X86
处理器为例(ARM
平台原理类似):
; initialize 'counter' with the desired number
label:
dec counter ; decrement counter
jnz label ; goes to label if counter is not zero
在实际执行时,现代的Intel X86
处理器会把dec
和jnz
这两条指令“融合”成一条指令,并在一个时钟周期执行完毕。因此只要知道完成一定数量的循环花费了多长时间,就可以计算得出当前系统的时钟频率近似值。
在代码中,Daniel Lemire
使用了一种叫做“measure-twice-and-subtract
”技巧:假设循环次数是65536
,每次实验跑两次。第一次执行65536 * 2
次,花费时间是nanoseconds1
;第二次执行65536
次,花费时间是nanoseconds2
。那么我们就得到3
个执行65536
次数的时间:nanoseconds1 / 2
,nanoseconds1 - nanoseconds2
和nanoseconds2
。这三个时间之间的误差必须小于一个值才认为此次实验结果是有效的:
......
double nanoseconds = (nanoseconds1 - nanoseconds2);
if ((fabs(nanoseconds - nanoseconds1 / 2) > 0.05 * nanoseconds) or
(fabs(nanoseconds - nanoseconds2) > 0.05 * nanoseconds)) {
return 0;
}
......
最后把有效的测量值排序取中位数(median
):
......
std::cout << "Got " << freqs.size() << " measures." << std::endl;
std::sort(freqs.begin(),freqs.end());
std::cout << "Median frequency detected: " << freqs[freqs.size() / 2] << " GHz" << std::endl;
......
在我的系统上,lscpu
显示的CPU
时钟频率:
$ lscpu
......
CPU MHz: 1000.007
CPU max MHz: 3700.0000
CPU min MHz: 1000.0000
......
实际测量结果:
$ ./loop.sh
g++ -O2 -o reportfreq reportfreq.cpp -std=c++11 -Wall -lm
measure using a tight loop:
Got 9544 measures.
Median frequency detected: 3.39196 GHz
measure using an unrolled loop:
Got 9591 measures.
Median frequency detected: 3.39231 GHz
measure using a tight loop:
Got 9553 measures.
Median frequency detected: 3.39196 GHz
measure using an unrolled loop:
Got 9511 measures.
Median frequency detected: 3.39231 GHz
measure using a tight loop:
Got 9589 measures.
Median frequency detected: 3.39213 GHz
measure using an unrolled loop:
Got 9540 measures.
Median frequency detected: 3.39196 GHz
.......