Beware NDEBUG controls assert

According to assert document, if NDEBUG is defined, assert actually does nothing:

#ifdef NDEBUG
#define assert(condition) ((void)0)
#else
#define assert(condition) /*implementation defined*/
#endif

So if your program doesn’t trigger assert as you expected, please check whether NDEBUG is defined or not. E.g., if you use cmake to manage the build process, set CMAKE_BUILD_TYPE‘s value as Release will define NDEBUG:

# cmake -DCMAKE_BUILD_TYPE=Release ..
# make VERBOSE=1
......
/opt/cuda/bin/nvcc ...... \"-Wall\",\"-fPIC\",\"-O3\",\"-DNDEBUG\" ...... 

Warning: don’t put statements in assert, like this:

assert(in >> n);

This will incur the “in >> n” only be executed in debug mode, not release mode.

Small examples show copy elision in C++

“return value optimization” is a common technique of copy elision whose target is eliminating unnecessary copying of objects. Check the following example:

#include <iostream>
using namespace std;

class Foo {
public:
    Foo() {cout<<"default constructor is called"<<endl;}
    Foo(const Foo& other) {cout<<"copy constructor is called"<<endl;}
    Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}
};

Foo func()
{
    Foo f;
    return f;
}

int main()
{
    Foo a = func();
    return 0;
}

The compiler is clang 5.0.1:

# c++ --version
OpenBSD clang version 5.0.1 (tags/RELEASE_501/final) (based on LLVM 5.0.1)
Target: amd64-unknown-openbsd6.3
Thread model: posix
InstalledDir: /usr/bin

Build an execute it:

# c++ -std=c++11 test.cpp
# ./a.out
default constructor is called

You may expect Foo‘ copy constructor is called at least once:

Foo a = func();

However, the reality is that compiler may be clever enough to know the content in local variable f of func() will be finally copied to a, so it creates a first, and pass a into func(), like this:

Foo a;
func(&a);

Let’s modify the program:

#include <iostream>
using namespace std;

class Foo {
public:
    Foo() {cout<<"default constructor is called"<<endl;}
    Foo(const Foo& other) {cout<<"copy constructor is called"<<endl;}
    Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}
};

Foo func(Foo f)
{
    return f;
}

int main()
{
    Foo a;
    Foo b = func(a);
    return 0;
}

This time, the temp variable f of func() is a parameter. Build and run it:

# c++ -std=c++11 test.cpp
# ./a.out
default constructor is called
copy constructor is called
move constructor is called

the temp variable fof func() is constructed by copy constructor:

Foo b = func(a);

In above statement, the func(a) returns a temporary variable, which is a rvalue, so the Foo‘s move constructor is used to construct b. If Foo doesn’t define move constructor:

Foo(Foo&& other) {cout<<"move constructor is called"<<endl;}

Then “Foo b = func(a);” will trigger copy constructor to initialize b.

A performance issue about copy constructor

These two day, I debugged a performance issue which is related to copy constructor: the class A has a member b which is NTL::ZZX type:

class A
{
    enum class type {zzx_t, ...} t;
    NTL::ZZX b;
    ......
}

When member t‘s value is zzx_t, b is valid. Otherwise b‘s content should be outdated.

There are 2 methods of implementing A‘s copy constructor:
(1)

A(const A& other) : t(other.t), b(other.b)
{
    ......
}

In this method, NTL::ZZX‘s copy constructor is called in spite of anything.

(2)

A(const A& other) : t(other.t)
{
    ......
    if (t == zzx_t)
    {
        b = other.b;
    }
    .....
}

In this case, NTL::ZZX‘s default constructor is called first. NTL::ZZX‘s copy assignment operator is invoked only if “t == zzx_t” condition is met.

NTL::ZZX‘s default constructor nearly does nothing, and copy constructor does approximate work as copy assignment operator. But in our scenario, t‘s value is not zzx_t at 80 percent of the time. So the second implementation of copy constructor gets a big performance boost compared to first one.

Performance comparison between string::at and string::operator[] in C++

Check following testPalindrome_index.cpp program which utilizes string::operator[]:

#include <string>

bool isPalindrome(
    std::string& s,
    std::string::size_type start,
    std::string::size_type end) {

        auto count = (end - start + 1) / 2;
        for (std::string::size_type i = 0; i < count; i++) {
            if (s[start] != s[end]) {
                return false;
            }
            start++;
            end--;
        }

        return true;
}

int main() {
        std::string s(1'000'000'000, 'a');

        isPalindrome(s, 0, s.size() - 1);
        return 0;
}

My compile is clang++, and measure the execution time without & with optimization:

# c++ -std=c++14 testPalindrome_index.cpp -o index
# time ./index
    0m13.84s real     0m12.77s user     0m01.06s system
# c++ -std=c++14 -O2 testPalindrome_index.cpp -o index
# time ./index
    0m01.44s real     0m00.42s user     0m01.01s system

We can see the time differences are so large (13.84 vs 1.44)!

Then change the code to use string::at:

#include <string>

bool isPalindrome(
    std::string& s,
    std::string::size_type start,
    std::string::size_type end) {

        auto count = (end - start + 1) / 2;
        for (std::string::size_type i = 0; i < count; i++) {
            if (s.at(start) != s.at(end)) {
                return false;
            }
            start++;
            end--;
        }

        return true;
}

int main() {
        std::string s(1'000'000'000, 'a');

        isPalindrome(s, 0, s.size() - 1);
        return 0;
}

Compile and test again:

# c++ -std=c++14 testPalindrome_at.cpp -o at
# time ./at
    0m07.31s real     0m06.36s user     0m00.96s system
# c++ -std=c++14 -O2 testPalindrome_at.cpp -o at
# time ./at
    0m06.42s real     0m05.45s user     0m00.97s system

We can see the time gap is nearly 1 second, and not outstanding as the first case. But the time with “-O2” optimization is 6.42, far bigger than 1.44which uses string::operator[].

The conclusion is if the string is long enough, the performance bias of using string::operator[] and string::at is remarkable. So this factor should be considered when decide which function should be used.

P.S., the full code is here.

Benchmark C++ ifstream and mmap

After reading Which is fastest: read, fread, ifstream or mmap?, I try to benchmark C++ ifstream and mmap myself.

The test file is number.txt, and the size is 4GiB:

# ls -alt number.txt
-rw-r--r-- 1 root root 4294967296 Apr  2 13:51 number.txt

The test_ifstream.cpp is like this:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

char chunk[1048576];

int main(void)
{
    std::ifstream ifs(FILE_NAME, std::ios_base::binary);
    if (!ifs) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        unsigned long long res = 0;
        ifs.seekg(0);
        auto begin = std::chrono::system_clock::now();

        for (size_t j = 0; j < 4096; j++) {
            ifs.read(chunk, sizeof(chunk));
            for (size_t k = 0; k < sizeof(chunk); k++) {
                res += chunk[k];
            }
        }
        ofs << res;

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;
    return 0;
}

The program reads 1MiB(1024 * 1024 = 1048576) every time (the total count is 4096). Use -O2 optimization:

# clang++ -O2 test_ifstream.cpp -o test_ifstream
# ./test_ifstream
57208
57085
57061
57105
57069
Average exec time: 57105

The average execution time is 57105 ms. From the htop output:

1

We can see test_ifstream occupies very little memory.

The following is test_mmap file:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>


const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

int main(void)
{
    int fd = ::open(FILE_NAME.c_str(), O_RDONLY);
    if (fd < 0) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    auto file_size = lseek(fd, 0, SEEK_END);

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        lseek(fd, 0, SEEK_SET);
        unsigned long long res = 0;
        auto begin = std::chrono::system_clock::now();

        char *chunk = reinterpret_cast<char*>(mmap(NULL, file_size, PROT_READ, MAP_FILE | MAP_SHARED, fd, 0));
        char *addr = chunk;

        for (size_t j = 0; j < file_size; j++) {
            res += *chunk++;
        }
        ofs << res;

        munmap(addr, file_size);

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;

    ::close(fd);
    return 0;
}

Still use -O2 optimization:

# clang++ -O2 test_mmap.cpp -o test_mmap
# ./test_mmap
57241
57095
57038
57008
57175
Average read time: 57111

We can see the execution time of test_mmap is similar as test_ifstream, whereas test_mmap uses more memory:

2

P.S., the full code is here.