The IO stream’s state when EOF occurs

Check following simple C++ program:

#include <iostream>
using namespace std;

int
main()
{
    char ch;

    while (cin >> ch)
    {
        cout << ch << '\n';
    }

    cout << "bad: " << cin.bad() << ", eof: " << cin.eof() << ", fail: " << cin.fail() << '\n';

    return 0;
}

Compile and run it, press Ctrl + D:

$ c++ foo.cpp -o foo
$ ./foo
bad: 0, eof: 1, fail: 1

We can see both fail and eof bits are set to 1. From this table, we can see when both fail and eof bits are set to 1, the operator bool of stream will return false.

Use volatile variable in multi-thread programming

About volatile variable, I think 4.3.4.2 A Volatile Solution from Is Parallel Programming Hard, And, If So, What Can You Do About It? is a good reference. The following is excerpted from it:

To summarize, the volatile keyword can prevent load tearing and store tearing in cases where the loads and stores are machine-sized and properly aligned. It can also prevent load fusing, store fusing, invented loads, and invented stores. However, although it does prevent the compiler from reordering volatile accesses with each other, it does nothing to prevent the CPU from reordering these accesses. Furthermore, it does nothing to prevent either compiler or CPU from reordering non-volatile accesses with each other or with volatile accesses. Preventing these types of reordering requires the techniques described in the next section.

So if your variables satisfies following conditions:

(1) The load and store of variable is machine-sized and properly aligned;
(2) Code reordering doesn’t affect the logic of program.

You can use volatile.

BTW, there is a code sample for reference.

Display running process’s thread IDs on Linux

On Linux, “ps -T” can show threads information of running process:

# ps -T 2739
  PID  SPID TTY      STAT   TIME COMMAND
 2739  2739 pts/0    Sl     0:00 ./spawn_threads
 2739  2740 pts/0    Sl     0:00 ./spawn_threads
 2739  2741 pts/0    Sl     0:00 ./spawn_threads

On proc pseudo file system, there is a task directory which records thread information:

# ls -lt /proc/2739/task
total 0
dr-xr-xr-x 7 root root 0 Jun 28 14:55 2739
dr-xr-xr-x 7 root root 0 Jun 28 14:55 2740
dr-xr-xr-x 7 root root 0 Jun 28 14:55 2741

Since C++17, there is a filesystem library which can be used to access file system, and I leverage this library to traverse the /proc/$pid/task folder to get the thread IDs of process:

    ......
    std::filesystem::path p{"/proc"};
    p /= argv[1];
    p /= "task";
    ......
    uint64_t thread_num{};
    std::vector<std::string> thread_id;

    std::filesystem::directory_iterator d_it(p);
    for (const auto& it : d_it)
    {
        thread_num++;
        thread_id.push_back(it.path().filename().string());
    }

    std::cout << "Process ID (" << argv[1] << ") has " << thread_num << " threads, and ids are:\n";
    for (const auto& v : thread_id)
    {
        std::cout << v << '\n';
    }
    ......

Build and run it:

# ./show_thread_ids 2739
Process ID (2739) has 3 threads, and ids are:
2739
2740
2741

P.S., the full code is here.

Different RVO behaviors between gcc and clang

Regarding RVO (Return Value Optimization), I think this video gives a real good explanation. Let’s cut to the chase. Check following code:

# cat rvo.cpp
#include <iostream>

class Foo
{
public:
        Foo(){std::cout << "Foo default constructor!\n";}
        ~Foo(){std::cout << "Foo destructor!\n";}
        Foo(const Foo&){std::cout << "Foo copy constructor!\n";}
        Foo& operator=(const Foo&){std::cout << "Foo assignment!\n"; return *this;}
        Foo(const Foo&&){std::cout << "Foo move constructor!\n";}
        Foo& operator=(const Foo&&){std::cout << "Foo move assignment!\n"; return *this;}
};

Foo func(bool flag)
{
        Foo temp;
        if (flag)
        {
                std::cout << "if\n";
        }
        else
        {
                std::cout << "else\n";
        }
        return temp;
}

int main()
{
        auto f = func(true);
        return 0;
}

On my Arch Linux platform, gcc version is 8.2.1 and clang version is 8.0.0. I tried to use -std=c++11, -std=c++14, -std=c++17 and -std=c++2a for both compilers, all generated same output:

Foo default constructor!
if
Foo destructor!

So both compilers are clever enough to realize there is no need to create “Foo temp” variable (please refer Small examples show copy elision in C++). Modify the func a little:

Foo func(bool flag)
{
        if (flag)
        {
                Foo temp;
                std::cout << "if\n";
                return temp;
        }
        else
        {
                Foo temp;
                std::cout << "else\n";
                return temp;
        }
}

This time, For clang compiler (All four compiling options: -std=c++11, -std=c++14, -std=c++17 and -std=c++2a), the program generated output as same as above:

Foo default constructor!
if
Foo destructor!

But for gcc, (All four compiling options: -std=c++11, -std=c++14, -std=c++17 and -std=c++2a), the program generated different output:

Foo default constructor!
if
Foo move constructor!
Foo destructor!
Foo destructor!

So it means gcc generated both two variables: “Foo temp” and “auto f“. It not only means clang does better optimization than gcc in this scenario, but means if you do something in move constructor, and expect it should be called. You program logic will depend on the compiler: it works under gcc but not for clang. Beware of this trap, otherwise it may bite you one day, like me today!

Modifying memory pool helps me find a hidden bug

My project has a CUDA memory pool which uses C++‘s std::queue. Allocating from the head of queue:

ptr = q.front();
q.pop(); 

While freeing memory insert it into the tail of queue:

q.push(ptr);  

I changed the implementation from std::queue to std::deque. Both allocating and freeing all occur in the front of queue:

ptr = q.front();
q.pop_front();
......
q.push_front(ptr);

This modification helps me find a hidden bug which is releasing memory early. In origin code, the memory is inserted at the end of queue. So there is a interval between it is reused by other threads and current thread, and the work can still be done correctly as long as it is not reused by others. But after using std::deque, the memory is immediately used by other threads, which disclose the bug.