Same function, different performance results by using Google benchmark

  c++, google-benchmark

I was trying to familiarize myself with the google benchmark framework, and decided to run a test with the famous pre/post increments. However, I found out that within the execution of the same function, it is literally the same code, I get different results in terms of time measurements.

My test consists of three functions:

  • incrementA, just a for-loop with nothing special
  • incrementB which is a copy of incrementA
  • increment that calls incrementA

With these three functions, I wrote a fixture and then registered the tests.

#include <assert.h>
#include <stdint.h>

#include <benchmark/benchmark.h>

//---------------------------------------------------------------------

void incrementA(int COUNT) {
    volatile int a[COUNT+1];
    int i = 0;
    for (int j = 0; j < 1000; j++) {
        i = 0;
        for (int k = 0; k < COUNT; k++) {
            a[i++] = k + j;
        }
    }
}

void incrementB(int COUNT) {
    volatile int a[COUNT+1];
    int i = 0;
    for (int j = 0; j < 1000; j++) {
        i = 0;
        for (int k = 0; k < COUNT; k++) {
            a[i++] = k + j;
        }
    }
}

void increment(int COUNT) {
    incrementA(COUNT);
}

//---------------------------------------------------------------------

class PrePostIncrement : public ::benchmark::Fixture
{
public:
    void SetUp(const ::benchmark::State& st)
    {
        size = st.range(0);
    }

    void TearDown(const ::benchmark::State&)
    {
    }

    static void CustomArguments(benchmark::internal::Benchmark* b)
    {
        size_t minSize = 8;
        for (int i = 0; (1 << (i + minSize)) < (1 << 20); ++i)
            b->Arg(1 << (i + minSize));
    }
    int size;
};


//---------------------------------------------------------------------


#define REGISTER_TEST(IncrementFunction)                                                
    using IncrementFunction##_Test = PrePostIncrement;                                  
    BENCHMARK_DEFINE_F(IncrementFunction##_Test, Obj)(benchmark::State& state)          
    {                                                                                   
        while (state.KeepRunning())                                                     
        {                                                                               
            IncrementFunction(size);                                                    
        }                                                                               
    }                                                                                   
    BENCHMARK_REGISTER_F(IncrementFunction##_Test, Obj)->Apply(IncrementFunction##_Test::CustomArguments)->Unit(benchmark::kMillisecond);


REGISTER_TEST(incrementA);
REGISTER_TEST(incrementB);
REGISTER_TEST(increment);

BENCHMARK_MAIN();

Compiled with:

$ g++ increment_benchmark.cpp -std=gnu++14 -march=native -pthread -O3 -I/home/user/software/benchmark/include -L/home/user/software/benchmark/build/src -Wl,-rpath=/home/user/software/benchmark/build/src -lbenchmark

and the results are inconsistent, e.g. by swapping the order of the tests, I get different results.

---------------------------------------------------------------------
Benchmark                           Time             CPU   Iterations
---------------------------------------------------------------------
incrementA_Test/Obj/256         0.125 ms        0.125 ms         5499
incrementA_Test/Obj/512         0.244 ms        0.244 ms         2868
incrementA_Test/Obj/1024        0.482 ms        0.482 ms         1439
incrementA_Test/Obj/2048        0.971 ms        0.971 ms          715
incrementA_Test/Obj/4096         1.91 ms         1.91 ms          361
incrementA_Test/Obj/8192         3.82 ms         3.82 ms          180
incrementA_Test/Obj/16384        7.77 ms         7.77 ms           90
incrementA_Test/Obj/32768        15.6 ms         15.6 ms           45
incrementA_Test/Obj/65536        30.5 ms         30.5 ms           23
incrementA_Test/Obj/131072       61.7 ms         61.7 ms           11
incrementA_Test/Obj/262144        122 ms          122 ms            6
incrementA_Test/Obj/524288        245 ms          245 ms            3
incrementB_Test/Obj/256         0.084 ms        0.084 ms         8246
incrementB_Test/Obj/512         0.166 ms        0.166 ms         4212
incrementB_Test/Obj/1024        0.321 ms        0.321 ms         2175
incrementB_Test/Obj/2048        0.629 ms        0.629 ms         1109
incrementB_Test/Obj/4096         1.23 ms         1.23 ms          564
incrementB_Test/Obj/8192         2.42 ms         2.42 ms          288
incrementB_Test/Obj/16384        4.84 ms         4.84 ms          142
incrementB_Test/Obj/32768        9.63 ms         9.63 ms           72
incrementB_Test/Obj/65536        20.3 ms         20.3 ms           34
incrementB_Test/Obj/131072       40.8 ms         40.8 ms           17
incrementB_Test/Obj/262144       81.7 ms         81.7 ms            8
incrementB_Test/Obj/524288        164 ms          164 ms            4
increment_Test/Obj/256          0.126 ms        0.126 ms         5551
increment_Test/Obj/512          0.244 ms        0.244 ms         2861
increment_Test/Obj/1024         0.482 ms        0.482 ms         1453
increment_Test/Obj/2048         0.958 ms        0.958 ms          721
increment_Test/Obj/4096          1.91 ms         1.91 ms          364
increment_Test/Obj/8192          3.82 ms         3.82 ms          183
increment_Test/Obj/16384         7.63 ms         7.63 ms           91
increment_Test/Obj/32768         15.2 ms         15.2 ms           46
increment_Test/Obj/65536         30.5 ms         30.5 ms           23
increment_Test/Obj/131072        61.0 ms         61.0 ms           11
increment_Test/Obj/262144         122 ms          122 ms            6
increment_Test/Obj/524288         244 ms          244 ms            3

Initially I thought that maybe the scaling strategy (powersave) was perhaps influencing the results, but after changing it to performance, the results were the same.

Just for reference, I compiled the google framework (bf585a2 [v1.5.2]) and my libs are:

$ ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1.2) 2.27
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
$ g++ --version
g++ (Ubuntu 9.2.1-17ubuntu1~18.04.1) 9.2.1 20191102
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I am pretty sure there are different ways of writing this same test, and I am welcome to read any suggestions, but my main interest is to know what is wrong with my code, and why I get different results.

Source: Windows Questions C++

LEAVE A COMMENT