First taster of Standard C++ Thread Library on OpenBSD

Today I tried Standard C++ Thread Library on OpenBSD, since it requires the compiler to support C++11 standard, and the default c++ only support C++98 (please refer here), so I need to switch to clang++. The program is just a classic “Hello World”:

#include <thread>
#include <iostream>

void hello()
{
    std::cout << "Hello World!\n";
}

int main(void)
{
    std::thread t(hello);
    t.join();
    return 0;
}

Built and run it:

# clang++ -std=c++11 hello.cpp
root:/root/Project# ./a.out
terminate called after throwing an instance of 'std::system_error'
  what():  Enable multithreading to use std::thread: Operation not permitted
Abort trap (core dumped)

Whoops! The program crashed. After reading this post, adding -pthread during compilation fixed this issue:

# clang++ -pthread -std=c++11 hello.cpp
# ./a.out
Hello World!

The pitfalls of using OpenMP parallel for-loops

According to this discussion:

#pragma omp parallel for
for (...)
{
}

is a shortcut of

#pragma omp parallel
{ 
#pragma omp for
    for (...)
    {
    }
}

and it seems more convenient of using “#pragma omp parallel for“. But there are some pitfalls which you should pay attention to:

(1) You can’t assume the number of threads will be equal to for-loops iteration counts even it is very small. For example (The machine has only cores.):

#include <omp.h>
#include <stdio.h>

int main(void) {
#pragma omp parallel for
    for (int i = 0; i < 5; i++) {
        printf("thread is %d\n", omp_get_thread_num());
    }
    return 0;
}

Build and run this program:

# gcc -fopenmp parallel.c
# ./a.out
thread is 0
thread is 0
thread is 0
thread is 1
thread is 1

We can see only 2 threads are generated. Run it in another 32-core machine:

# ./a.out
thread is 1
thread is 0
thread is 2
thread is 4
thread is 3

We can see 5 threads are launched.

(2) Use num_threads clause to modify the program as following:

#include <omp.h>
#include <stdio.h>

int main(void) {
#pragma omp parallel for num_threads(5)
    for (int i = 0; i < 5; i++) {
        printf("thread is %d\n", omp_get_thread_num());
    }
    return 0;
}

Rebuild and run it on original 2-core machine:

# gcc -fopenmp parallel.c
# ./a.out
thread is 2
thread is 3
thread is 4
thread is 1
thread is 0

We can see this time 5 threads are created. But you should notice the actual thread count depends the system resource. E.g., change the code like this:

#pragma omp parallel for num_threads(30000)
    for (int i = 0; i < 30000; i++) {
        printf("thread is %d\n", omp_get_thread_num());
    }

Execute it:

# ./a.out

libgomp: Thread creation failed: Resource temporarily unavailable

So we should notice the the created thread number.

P.S., if the iteration number is smaller than core number, the number of threads will be equal to core number (still in 32-core machine):

#include <omp.h>
#include <stdio.h>

int main(void) {
#pragma omp parallel for
    for (int i = 0; i < 4; i++) {
        if (0 == omp_get_thread_num()) {
            printf("thread number is %d\n", omp_get_num_threads());
        }
    }
    return 0;
}

The output is:

thread number is 32

(3) If you use C++ thread_local variable, you should take care:

#include <omp.h>
#include <stdio.h>

int main(void) {
    thread_local int array[5] = {0};
#pragma omp parallel for num_threads(5)
    for (int i = 0; i < 5; i++) {
        array[i] = i + 1;
    }

    for (int i = 0; i < 5; i++) {
        printf("array[%d] is %d\n", i, array[i]);
    }
    return 0;
}

Compile and run:

# g++ -fopenmp parallel.cpp
# ./a.out
array[0] is 1
array[1] is 0
array[2] is 0
array[3] is 0
array[4] is 0

We can see only the first element is changed, so it must be thread 0‘s work. Remove the thread_local qualifier, and rebuild. This time you get the wanted result:

# ./a.out
array[0] is 1
array[1] is 2
array[2] is 3
array[3] is 4
array[4] is 5

Build FFTW supporting multi-thread

According to FFTW document:

Starting from FFTW-3.3.5, FFTW supports a new API to make the planner thread-safe:

void fftwmakeplannerthreadsafe(void);

This call installs a hook that wraps a lock around all planner calls.

To use fftw_make_planner_thread_safe() API, you need to add --enable-threads in configuration stage:

$ ./configure --enable-threads
$ make  
$ sudo make install

There will be two additional libraries (libfftw3_threads.a and libfftw3_threads.la) generated compared with no --enable-threads option:

$ cd /usr/local/lib/
$ ls -alt libfftw*
-rw-r--r-- 1 root root   44528 Jan 16 09:18 libfftw3_threads.a
-rwxr-xr-x 1 root root     946 Jan 16 09:18 libfftw3_threads.la
-rw-r--r-- 1 root root 2222828 Jan 16 09:18 libfftw3.a
-rwxr-xr-x 1 root root     886 Jan 16 09:18 libfftw3.la

BTW, although fftw_make_planner_thread_safe() API was introduced in FFTW-3.3.5, it didn’t really work until FFTW-3.3.6 who contains an important patch. Please refer the discussion and release note.

A trick of building multithreaded application on Solaris

Firstly, Let’s see a simple multithreaded application:

#include <stdio.h>
#include <pthread.h>
#include <errno.h>

void *thread1_func(void *p_arg)
{
           errno = 0;
           sleep(3);
           errno = 1;
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

void *thread2_func(void *p_arg)
{
           errno = 0;
           sleep(5);
           printf("%s exit, errno is %d\n", (char*)p_arg, errno);
}

int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread1_func, "Thread 1");
        pthread_create(&t2, NULL, thread2_func, "Thread 2");

        sleep(10);
        return;
}

What output do you expect from this program? Per my understanding, the errnoshould be a thread-safe variable. Though The thread1_func function changes theerrno, it should not affect errno in thread2_func function.

Let’s check it on Solaris 10:

bash-3.2# gcc -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 1

Oh! The errno in thread2_func function is also changed to 1. Why does it happen? Let’s find the root cause from the errno.h file:

/*
 * Error codes
 */

#include <sys/errno.h>

#ifdef  __cplusplus
extern "C" {
#endif

#if defined(_LP64)
/*
 * The symbols _sys_errlist and _sys_nerr are not visible in the
 * LP64 libc.  Use strerror(3C) instead.
 */
#endif /* _LP64 */

#if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L
extern int *___errno();
#define errno (*(___errno()))
#else
extern int errno;
/* ANSI C++ requires that errno be a macro */
#if __cplusplus >= 199711L
#define errno errno
#endif
#endif  /* defined(_REENTRANT) || defined(_TS_ERRNO) */

#ifdef  __cplusplus
}
#endif

#endif  /* _ERRNO_H */

We can find the errno can be a thread-safe variable(#define errno (*(___errno()))) only when the following macros defined:

defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L

Let’s try it:

bash-3.2# gcc -D_POSIX_C_SOURCE=199506L -g -o a a.c -lpthread
bash-3.2# ./a
Thread 1 exit, errno is 1
Thread 2 exit, errno is 0

Yes, the output is right!

From Compiling a Multithreaded Application, we can see:

For POSIX behavior, compile applications with the -D_POSIX_C_SOURCE flag set >= 199506L. For Solaris behavior, compile multithreaded programs with the -D_REENTRANT flag.

So we should pay more attentions when building multithreaded application on Solaris.

Reference:
(1) Compiling a Multithreaded Application;
(2) What is the correct way to build a thread-safe, multiplatform C library?