c++ - Benchmark exceeding CPU frequency -
i'm using benchmark.js clock 2 versions of function, 1 in js , 1 in c++ (a node.js binding).
the c++ version loop single compiler intrinsic (2 cycle latency + 0.5 cycle throughput):
for (size_t = 0; < arrlen; i++) { #if defined(_msc_ver) (*events)[i] = _byteswap_ushort((*events)[i]); #elif defined(__gnuc__) (*events)[i] = __builtin_bswap16((*events)[i]); #endif }
i expect fast ... it's clocking faster cpu frequency (4.0 ghz). how can happening? (i have tested function works outside of benchmark suite.)
native: 17,253,787,071 elements/sec (10k elements in array * 1,725,379 calls/sec) js: 846,298,297 elements/sec (10k elements in array * 84,630 calls/sec) // both ~90 runs sampled
hard without more context, 1 or more of following:
the compiler using instructions such pshufb byte-swap more 1 element @ time. (pshufb can potentially swap 16 words @ time on processors avx2 support.)
pipelining effects allowing processor handle multiple iterations of loop simultaneously.
there problem benchmark allowing entire calculation optimized away. (unlikely worth mentioning.)