nanxiao | Nan Xiao's Blog

A scenario in which clang will generate ud2 instruction while gcc not

Check following main.cpp:

# cat main.cpp
#include <functional>
#include <thread>

int main()
{
    int a = 2;

    std::function<void ()> work = [&]()
    {
        if (a < 1) {
            return 1;
        }
        while (1)
        {
            if (a > 1)
            {
                break;
            }
        }
    };


    std::thread t(mergeWork);
    t.join();
    return 0;
}

The prototype of work is void (), while in a < 1 branch, it actually has a return value:

if (a < 1) {
    return 1;
}

Compile the program with g++, and execute it:

# g++ -g -pthread main.cpp
# ./a.out
#

It runs smoothly. Switch to clang++:

# clang++ -g -pthread main.cpp
main.cpp:20:5: warning: control may reach end of non-void lambda [-Wreturn-type]
    };
    ^
1 warning generated.
# ./a.out
Illegal instruction (core dumped)

A warning is generated during building and “Illegal instruction” occurs during running. Check the assembly code of work:

 disassemble
Dump of assembler code for function main::$_0::operator()() const:
   0x0000555555555100 <+0>:     push   %rbp
   0x0000555555555101 <+1>:     mov    %rsp,%rbp
   0x0000555555555104 <+4>:     mov    %rdi,-0x8(%rbp)
   0x0000555555555108 <+8>:     mov    -0x8(%rbp),%rdi
=> 0x000055555555510c <+12>:    mov    (%rdi),%rax
   0x000055555555510f <+15>:    cmpl   $0x1,(%rax)
   0x0000555555555112 <+18>:    mov    %rdi,-0x10(%rbp)
   0x0000555555555116 <+22>:    jge    0x555555555123 <main::$_0::operator()() const+35>
   0x000055555555511c <+28>:    mov    $0x1,%eax
   0x0000555555555121 <+33>:    pop    %rbp
   0x0000555555555122 <+34>:    retq
   0x0000555555555123 <+35>:    jmpq   0x555555555128 <main::$_0::operator()() const+40>
   0x0000555555555128 <+40>:    mov    -0x10(%rbp),%rax
   0x000055555555512c <+44>:    mov    (%rax),%rcx
   0x000055555555512f <+47>:    cmpl   $0x1,(%rcx)
   0x0000555555555132 <+50>:    jle    0x55555555513d <main::$_0::operator()() const+61>
   0x0000555555555138 <+56>:    jmpq   0x555555555142 <main::$_0::operator()() const+66>
   0x000055555555513d <+61>:    jmpq   0x555555555128 <main::$_0::operator()() const+40>
   0x0000555555555142 <+66>:    ud2
End of assembler dump.

We can find an ud2 instruction at the end of lambda function. Modify the a < 1 branch:

if (a < 1) {
     return;
 }

This time, the program runs OK.

Upgrade OpenBSD from 6.2 to 6.3

Since OpenBSD 6.3 is released, it is time to upgrade 6.2.

The upgrade manual is here. But for newbies like me, I think the most challenge step is booting from ramdisk kernel, bsd.rd: Download and copy it into root file system:

# mv bsd.rd /

Then reboot machine, during prompting boot>, input boot /bsd.rd:

boot> boot /bsd.rd

Then upgrade OpenBSD according to the instructions.

For me, there are 2 important aspects of OpenBSD 6.3:
(1) The vim is upgrade to 8.0.1589, and the strange display issue is fixed when using terminal (Please refer this thread).
(2) My lscpu is included into ports since OpenBSD 6.3, so you can install it on x86 architectures:

# pkg_add lscpu
# lscpu
Architecture:            amd64
Byte Order:              Little Endian
Active CPU(s):           2
Total CPU(s):            2
......

Benchmark C++ ifstream and mmap

After reading Which is fastest: read, fread, ifstream or mmap?, I try to benchmark C++ ifstream and mmap myself.

The test file is number.txt, and the size is 4GiB:

# ls -alt number.txt
-rw-r--r-- 1 root root 4294967296 Apr  2 13:51 number.txt

The test_ifstream.cpp is like this:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>

const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

char chunk[1048576];

int main(void)
{
    std::ifstream ifs(FILE_NAME, std::ios_base::binary);
    if (!ifs) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        unsigned long long res = 0;
        ifs.seekg(0);
        auto begin = std::chrono::system_clock::now();

        for (size_t j = 0; j < 4096; j++) {
            ifs.read(chunk, sizeof(chunk));
            for (size_t k = 0; k < sizeof(chunk); k++) {
                res += chunk[k];
            }
        }
        ofs << res;

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;
    return 0;
}

The program reads 1MiB(1024 * 1024 = 1048576) every time (the total count is 4096). Use -O2 optimization:

# clang++ -O2 test_ifstream.cpp -o test_ifstream
# ./test_ifstream
57208
57085
57061
57105
57069
Average exec time: 57105

The average execution time is 57105 ms. From the htop output:

We can see test_ifstream occupies very little memory.

The following is test_mmap file:

#include <chrono>
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/mman.h>


const std::string FILE_NAME = "number.txt";
const std::string RESULT_FILE_NAME = "result.txt";

int main(void)
{
    int fd = ::open(FILE_NAME.c_str(), O_RDONLY);
    if (fd < 0) {
        std::cerr << "Error opeing " << FILE_NAME << std::endl;
        exit(1);
    }
    std::ofstream ofs(RESULT_FILE_NAME, std::ios_base::binary);
    if (!ofs) {
        std::cerr << "Error opeing " << RESULT_FILE_NAME << std::endl;
        exit(1);
    }

    auto file_size = lseek(fd, 0, SEEK_END);

    std::vector<std::chrono::milliseconds> duration_vec(5);
    for (std::vector<std::chrono::milliseconds>::size_type i = 0; i < duration_vec.size(); i++) {
        lseek(fd, 0, SEEK_SET);
        unsigned long long res = 0;
        auto begin = std::chrono::system_clock::now();

        char *chunk = reinterpret_cast<char*>(mmap(NULL, file_size, PROT_READ, MAP_FILE | MAP_SHARED, fd, 0));
        char *addr = chunk;

        for (size_t j = 0; j < file_size; j++) {
            res += *chunk++;
        }
        ofs << res;

        munmap(addr, file_size);

        duration_vec[i] = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::system_clock::now() - begin);
        std::cout<< duration_vec[i].count() << std::endl;
    }

    std::chrono::milliseconds total_time{0};
    for (auto const& v : duration_vec) {
        total_time += v;
    }
    std::cout << "Average exec time: " << total_time.count() / duration_vec.size() << std::endl;

    ::close(fd);
    return 0;
}

Still use -O2 optimization:

# clang++ -O2 test_mmap.cpp -o test_mmap
# ./test_mmap
57241
57095
57038
57008
57175
Average read time: 57111

We can see the execution time of test_mmap is similar as test_ifstream, whereas test_mmap uses more memory:

P.S., the full code is here.

What is a good software engineer job?

After working as a professional software engineer for 10 years, I want to share what I think is a good software job here.

(1) The opportunity of trying different stuff.
You have the chance to touch the different technology, not necessarily the newest. E.g., after working as an application engineer for one or two years, you can make your hands dirty on kernel. Be bored of using C++, it is time to have a taste of rust or Go. All in all, keep freshment is an important factor to motivate you.

(2) Contribute to FOSS.
No one will deny that both the tech giants and start-ups depend on FOSS (Free/Open Source Software) greatly now (Even Microsoft “loves” Linux). If your employer encourage you to contribute back to the project rather than only exploit it, it is really beneficial for community, yourself and company. “A stone kill three birds!”

(3) Sponsor you attend tech conferences and training.
Attending tech conference can expand your horizon, know more new people, and be aware of what other companies are doing and concentrating on in your area. If you can give a talk on some famous conferences, it will earn reputation for both you and your company. Besides this, life-long studying is important for software engineer, so it will be fantastic if your boss is willing to pay money for your enrichment.

(4) Encourage innovation and knowledge sharing.
It will be a great favor if you have ~10% work time for your own project. Actually, the company can select and cultivate seed from these projects, and it may bring potential success. The classical example is gmail. On top of that, Knowledge sharing is a good section in team meeting in which you can learn skills from your colleagues.

Besides the aspects mentioned above, I think there are some other factors. For example, salary, welfare, promotion, etc. Since these are not software job exclusive, and a little complicated among different industries, companies and even countries, I won’t elaborate it here.

Pay attention to “ECC” in using NVIDIA GPU

Today, I tested on an old machine with NVIDIA GPU k80 installed, but found strange issue, such as creating stream failed. Using nvidia-smi command and find the 3rd card’s ECC is 2, not 0:

It seems there is no better method. Just restart the server to reset GPU, and it worked normally.