An experiment about OpenMP parallel loop

From my testing, the OpenMP will launch threads number equals to “virtual CPU”, though it is not 100% guaranteed. Today I do a test about whether loop levels affect OpenMP performance.

Given the “virtual CPU” is 104 on my system, I define following constants and variables:

#define CPU_NUM (104)
#define LOOP_NUM (100)
#define ARRAY_SIZE (CPU_NUM * LOOP_NUM)

double a[ARRAY_SIZE], b[ARRAY_SIZE], c[ARRAY_SIZE];  

(1) Just one-level loop:

#pragma omp parallel
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        func(a, b, c, i);
    }

Execute 10 times consecutively:

$ cc -O2 -fopenmp parallel.c
$ ./a.out
Time consumed is 7.208773
Time consumed is 7.080540
Time consumed is 7.643123
Time consumed is 7.377163
Time consumed is 7.418053
Time consumed is 7.226235
Time consumed is 7.887611
Time consumed is 7.200167
Time consumed is 7.264515
Time consumed is 7.140937

(2) Use two-level loop:

for (int i = 0; i < LOOP_NUM; i++)
{
    #pragma omp parallel
        for (int j = 0; j < CPU_NUM; j++)
        {
            func(a, b, c, i * CPU_NUM + j);
        }
}

Execute 10 times consecutively:

$ cc -O2 -fopenmp parallel.c
$ ./a.out
Time consumed is 8.333529
Time consumed is 8.164226
Time consumed is 9.705631
Time consumed is 8.695201
Time consumed is 8.972555
Time consumed is 8.126084
Time consumed is 8.286818
Time consumed is 8.162565
Time consumed is 7.884917
Time consumed is 8.073982

At least from this test, one-level loop has a better performance. If you are interested, the source code is here.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.