c++ - Benchmark exceeding CPU frequency -


i'm using benchmark.js clock 2 versions of function, 1 in js , 1 in c++ (a node.js binding).

the c++ version loop single compiler intrinsic (2 cycle latency + 0.5 cycle throughput):

for (size_t = 0; < arrlen; i++) { #if defined(_msc_ver)     (*events)[i] = _byteswap_ushort((*events)[i]); #elif defined(__gnuc__)     (*events)[i] = __builtin_bswap16((*events)[i]); #endif } 

i expect fast ... it's clocking faster cpu frequency (4.0 ghz). how can happening? (i have tested function works outside of benchmark suite.)

native: 17,253,787,071 elements/sec (10k elements in array * 1,725,379 calls/sec) js: 846,298,297 elements/sec (10k elements in array * 84,630 calls/sec) // both ~90 runs sampled 

hard without more context, 1 or more of following:

  • the compiler using instructions such pshufb byte-swap more 1 element @ time. (pshufb can potentially swap 16 words @ time on processors avx2 support.)

  • pipelining effects allowing processor handle multiple iterations of loop simultaneously.

  • there problem benchmark allowing entire calculation optimized away. (unlikely worth mentioning.)


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)