Beware of OpenMP’s thread pool

For the sake of efficiency, OpenMP‘s implementation always uses thread pool to cache threads (please refer this topic). Check following simple code:

#include <unistd.h>
#include <stdio.h>
#include <omp.h>

int main(void){
        #pragma omp parallel for
        for(int i = 0; i < 256; i++)

        printf("Exit loop\n");

        while (1)

        return 0;

Mys server has 104 logical CPUs. Build and run it:

$ gcc -fopenmp test.c -o test
$ ./test
Exit loop

After “Exit loop” is printed, there is actually only master thread is active. Check number of threads:

$ ps --no-headers -T `pidof test` | wc -l

We can see all non-active threads are not destroyed and ready for future use (clang also uses thread pool inside).

The 103 non-active threads are not free; they consume resource and Operating System needs to take care of them. Sometimes they can encumber your process’s performance, especially on a system which already has heavy load. So when you write following code next time:

 #pragma omp parallel for

Try to answer following questions:
1) How many threads will be spawned?
2) Will these threads be actively used in future or only this time? If they are only valid for this time, is it possible that they become burden of the process? Please try to measure the performance of program. If the answer is yes, how about use other thread implementation instead?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.