Porting google/benchmark into OpenBSD

I want to use google/benchmark on OpenBSD, but find it support many platforms whereas lacks OpenBSD (the code is here):

#if defined(__CYGWIN__)
  #define BENCHMARK_OS_CYGWIN 1
#elif defined(_WIN32)
  #define BENCHMARK_OS_WINDOWS 1
#elif defined(__APPLE__)
  #define BENCHMARK_OS_APPLE 1
  #include "TargetConditionals.h"
  #if defined(TARGET_OS_MAC)
    #define BENCHMARK_OS_MACOSX 1
    #if defined(TARGET_OS_IPHONE)
      #define BENCHMARK_OS_IOS 1
    #endif
  #endif
#elif defined(__FreeBSD__)
  #define BENCHMARK_OS_FREEBSD 1
#elif defined(__NetBSD__)
  #define BENCHMARK_OS_NETBSD 1
#elif defined(__linux__)
  #define BENCHMARK_OS_LINUX 1
#elif defined(__native_client__)
  #define BENCHMARK_OS_NACL 1
#elif defined(EMSCRIPTEN)
  #define BENCHMARK_OS_EMSCRIPTEN 1
#elif defined(__rtems__)
  #define BENCHMARK_OS_RTEMS 1
#elif defined(__Fuchsia__)
#define BENCHMARK_OS_FUCHSIA 1
#elif defined (__SVR4) && defined (__sun)
#define BENCHMARK_OS_SOLARIS 1
#endif

Although it can be built successfully on OpenBSD, but “make test” reports some failures:

# make test

......

91% tests passed, 5 tests failed out of 54

Total Test time (real) =  40.18 sec

The following tests FAILED:
          1 - benchmark (Child aborted)
         38 - options_benchmarks (Child aborted)
         39 - basic_benchmark (Child aborted)
         43 - fixture_test (Child aborted)
         47 - reporter_output_test (Child aborted)
Errors while running CTest
*** Error 8 in /root/Project/benchmark/build (Makefile:130 'test': /usr/local/bin/ctest --force-new-ctest-process --exclude-regex "CMake.Fil...)

Check the following simple test file:

# cat test.cc
#include <benchmark/benchmark.h>

static void BM_StringCreation(benchmark::State& state) {
  for (auto _ : state)
    std::string empty_string;
}
// Register the function as a benchmark
BENCHMARK(BM_StringCreation);

// Define another benchmark
static void BM_StringCopy(benchmark::State& state) {
  std::string x = "hello";
  for (auto _ : state)
    std::string copy(x);
}
BENCHMARK(BM_StringCopy);

BENCHMARK_MAIN();

Compile and run it:

# c++ -I/usr/local/include -L/usr/local/lib -std=c++11 test.cc -o test -lbenchmark
root:/root/Project# ./test
failed to open /proc/cpuinfo
2018-05-02 17:14:11
Running ./test
Run on (-1 X 2545.25 MHz CPU )
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------
Benchmark                  Time           CPU Iterations
---------------------------------------------------------
BM_StringCreation         40 ns         40 ns   17597275
BM_StringCopy             13 ns         13 ns   53511385

failed to open /proc/cpuinfo“? “-1 X 2545.25 MHz CPU“? Messy output, so I decided to port it on OpenBSD:

(1) The first thing to do is defining BENCHMARK_OS_OPENBSDin src/internal_macros.h:

......
#elif defined(__OpenBSD__)
  #define BENCHMARK_OS_OPENBSD 1
......

(2) The second task should fill the value of CPUInfo‘s members:

struct CPUInfo {
  ......
  int num_cpus;
  double cycles_per_second;
  std::vector<CacheInfo> caches;
  bool scaling_enabled;
  ......
};

Check CPUInfo‘s constructor:

CPUInfo::CPUInfo()
    : num_cpus(GetNumCPUs()),
      cycles_per_second(GetCPUCyclesPerSecond()),
      caches(GetCacheSizes()),
      scaling_enabled(CpuScalingEnabled(num_cpus)) {}

I know I need to implement GetNumCPUs(), GetCPUCyclesPerSecond(), etc. For FreeBSD and NetBSD, benchmark uses sysctlbyname function:

......
if (sysctlbyname(Name.c_str(), nullptr, &CurBuffSize, nullptr, 0) == -1)
    return ValueUnion();
......

Unfortunately, OpenBSD doesn’t support sysctlbyname, so I use sysctl to get CPU’s number and speed:

 if ((Name == "hw.ncpu") || (Name == "hw.cpuspeed")){
    ValueUnion buff(sizeof(int));
    ......
    if (sysctl(mib, 2, buff.data(), &buff.Size, nullptr, 0) == -1) {
      return ValueUnion();
    }
    return buff;
  }

For cache information, the OpenBSD can’t provide ready-made information, and I think it is not worthy to use other work-around method to get it. E.g., use CPUID instruction on X86 architectures (If you really want to know it, lscpu will give you a hand on X86 platform) . Regarding to whether CPU support scaling or not, I can’t find any help about OpenBSD, so just leave it here.

The whole patch is here. Not sure whether google likes to merge it or not (Update: it is already merged), But at least all test cases can pass on OpenBSD now:

# make test

......

100% tests passed, 0 tests failed out of 54

Total Test time (real) =  17.54 sec

And the test program also outputs normal log:

# ./test
2018-05-02 14:49:03
Running ./a.out
Run on (2 X 2534 MHz CPU s)
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------
Benchmark                  Time           CPU Iterations
---------------------------------------------------------
BM_StringCreation         42 ns         42 ns   16761725
BM_StringCopy             13 ns         13 ns   51990267

P.S., if you want to use google/benchmark on OpenBSD, you can consider importing my patch. 🙂

First taste of google/benchmark

Today, I tried google/benchmark. The build process is idiomatic:

# git clone https://github.com/google/benchmark.git
# git clone https://github.com/google/googletest.git benchmark/googletest
# cd benchmark/
# mkdir build
# cd build/
# cmake ..
# make

But the “make test” generates an error (please refer this issue). Write a simple test.cc:

# cat test.cc
#include <benchmark/benchmark.h>

static void BM_StringCreation(benchmark::State& state) {
  for (auto _ : state)
    std::string empty_string;
}
// Register the function as a benchmark
BENCHMARK(BM_StringCreation);

// Define another benchmark
static void BM_StringCopy(benchmark::State& state) {
  std::string x = "hello";
  for (auto _ : state)
    std::string copy(x);
}
BENCHMARK(BM_StringCopy);

BENCHMARK_MAIN();

Build and run it:

# g++ -pthread test.cc -o test -lbenchmark
# ./test
2018-04-30 18:14:59
Running ./test
Run on (2 X 2394 MHz CPU s)
CPU Caches:
  L1 Data 32K (x2)
  L1 Instruction 32K (x2)
  L2 Unified 4096K (x1)
***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be noisy and will incur extra overhead.
***WARNING*** Library was built as DEBUG. Timings may be affected.
---------------------------------------------------------
Benchmark                  Time           CPU Iterations
---------------------------------------------------------
BM_StringCreation         10 ns         10 ns   71084124
BM_StringCopy             52 ns         52 ns   10000000

Maybe I will take a plunge in this project further.

Small examples show copy elision in C++

“return value optimization” is a common technique of copy elision whose target is eliminating unnecessary copying of objects. Check the following example:

#include <iostream>
using namespace std;

class Foo {
public:
    Foo() {cout<<"default constructor is called"<<endl;}
    Foo(const Foo& other) {cout<<"copy constructor is called"<<endl;}
    Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}
};

Foo func()
{
    Foo f;
    return f;
}

int main()
{
    Foo a = func();
    return 0;
}

The compiler is clang 5.0.1:

# c++ --version
OpenBSD clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
Target: amd64-unknown-openbsd6.3
Thread model: posix
InstalledDir: /usr/bin

Build an execute it:

# c++ -std=c++11 test.cpp
# ./a.out
default constructor is called

You may expect Foo‘ copy constructor is called at least once:

Foo a = func();

However, the reality is that compiler may be clever enough to know the content in local variable f of func() will be finally copied to a, so it creates a first, and pass a into func(), like this:

Foo a;
func(&a);

Let’s modify the program:

#include <iostream>
using namespace std;

class Foo {
public:
    Foo() {cout<<"default constructor is called"<<endl;}
    Foo(const Foo& other) {cout<<"copy constructor is called"<<endl;}
    Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}
};

Foo func(Foo f)
{
    return f;
}

int main()
{
    Foo a;
    Foo b = func(a);
    return 0;
}

This time, the temp variable f of func() is a parameter. Build and run it:

# c++ -std=c++11 test.cpp
# ./a.out
default constructor is called
copy constructor is called
move constructor is called

the temp variable fof func() is constructed by copy constructor:

Foo b = func(a);

In above statement, the func(a) returns a temporary variable, which is a rvalue, so the Foo‘s move constructor is used to construct b. If Foo doesn’t define move constructor:

Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}

Then “Foo b = func(a);” will trigger copy constructor to initialize b.

A performance issue about copy constructor

These two day, I debugged a performance issue which is related to copy constructor: the class A has a member b which is NTL::ZZX type:

class A
{
    enum class type {zzx_t, ...} t;
    NTL::ZZX b;
    ......
}

When member t‘s value is zzx_t, b is valid. Otherwise b‘s content should be outdated.

There are 2 methods of implementing A‘s copy constructor:
(1)

A(const A& other) : t(other.t), b(other.b)
{
    ......
}

In this method, NTL::ZZX‘s copy constructor is called in spite of anything.

(2)

A(const A& other) : t(other.t)
{
    ......
    if (t == zzx_t)
    {
        b = other.b;
    }
    .....
}

In this case, NTL::ZZX‘s default constructor is called first. NTL::ZZX‘s copy assignment operator is invoked only if “t == zzx_t” condition is met.

NTL::ZZX‘s default constructor nearly does nothing, and copy constructor does approximate work as copy assignment operator. But in our scenario, t‘s value is not zzx_t at 80 percent of the time. So the second implementation of copy constructor gets a big performance boost compared to first one.

Install vtop on OpenBSD

vtop is a tool which gives you an intuition of CPU/memory usage during a period. This post introduces how to install it on OpenBSD (assuming working asroot account):

(1) Install node.js (refer here):

# pkg_add node

(2) Install vtop:

# npm install -g vtop

(3) Since vtop uses Unicode braille characters to CPU and Memory charts, I need to change OpenBSD to use UTF-8 encoding (refer here and here):

export LC_CTYPE="en_US.UTF-8"

Now, vtop works well on my Cygwin terminal:

Capture