“argument to variable-length array may be too large [-Wvla-larger-than=]” warning

I use gcc-9 from CentOS:

$ /opt/rh/devtoolset-9/root/usr/bin/gcc --version
gcc (GCC) 9.3.1 20200408 (Red Hat 9.3.1-2)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

And I found if using -O3 compile option, for some Variable-length array in C programming language, gcc will report following warning:

warning: argument to variable-length array may be too large [-Wvla-larger-than=]
  596 |  uint8_t header[header_size];
      |          ^~~~~~~~~~~~~~~~~~

If not using -O3 option, the warning won’t be generated.

Exit main thread and keep other threads running

In C programming, if using return in main function, the whole process will terminate. To only let main thread gone, and keep other threads live, you can use thrd_exit in main function. Check following code:

#include <stdio.h>
#include <threads.h>
#include <unistd.h>

int
print_thread(void *s)
{
    thrd_detach(thrd_current());
    for (size_t i = 0; i < 5; i++)
    {
        sleep(1);
        printf("i=%zu\n", i);
    }
    thrd_exit(0);
}

int
main(void)
{
    thrd_t tid;
    if (thrd_success != thrd_create(&tid, print_thread, NULL)) {
        fprintf(stderr, "Create thread error\n");
        return 1;
    }
    thrd_exit(0);
}

Run it:

$ ./main
i=0
i=1
i=2
i=3
i=4

You can see even main thread exited, the other thread still worked.

P.S., the code can be downloaded here.

The c99 program on Void Linux

Today I found there is a c99 program in Void Linux:

$ c99
cc: fatal error: no input files
compilation terminated.
$ which c99
/usr/bin/c99

Check what it is:

$ file /usr/bin/c99
/usr/bin/c99: POSIX shell script, ASCII text executable
$ cat /usr/bin/c99
#!/bin/sh
exec /usr/bin/cc -std=c99 "$@"

Um, just a shell script which invokes /usr/bin/cc. So check cc program:

$ ll /usr/bin/cc
lrwxrwxrwx 1 root root 3 Jun  9 05:32 /usr/bin/cc -> gcc

Oh, a link to gcc.

Cacheline-Orientated programming

From CPU’s perspective, the memory hierarchy is registers, L1 cache, L2 cache, L3 cache, main memory, among others. The smallest unit of cache is one cacheline, and it is 64 bytes in most cases:

$ getconf LEVEL1_DCACHE_LINESIZE
64

To make your applications run efficiently, you need to take cacheline into account. Take notorious cacheline fales sharing as an example:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[14];
    };
    .....

The size of struct Foo is 64 bytes, and it can be stored in one cacheline. If CPU 0 accesses Foo.a while CPU 1 accesses Foo.b at the same time, there will be “cacheline ping-ponging” between CPUs, and the performance will be downgraded drastically.

The other trick is to allocate memory cacheline size aligned. Still use above struct Foo as the example. To guarantee the whole struct Foo in one cacheline, posix_memalign can be used:

    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo));

The 64 is the alignment requirement.

Last but not least, sometimes padding is needed. E.g.:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
        int padding[2];
    };
    ......
    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo) * 10);

Or using compiler’s aligned attribute:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
    } __attribute__((aligned(64)));;
    ......

The original struct Foo‘s size is 56 bytes, after padding (or through compiler’s aligned attribure), it becomes 64 bytes, and can be loaded in one cacheline. Now we can allocate an array of struct Foo, and every CPU will process one element of the array, no “cacheline ping-ponging” will occur.

The alignment of dynamically allocating memory

Check Notes from max_align_t:

Pointers returned by allocation functions such as malloc are suitably aligned for any object, which means they are aligned at least as strictly as max_align_t.

It means the memory allocated dynamically is guaranteed to alignof(max_align_t) bytes aligned.

Check Notes from aligned_alloc:

Passing a size which is not an integral multiple of alignment or a alignment which is not valid or not supported by the implementation causes the function to fail and return a null pointer (C11, as published, specified undefined behavior in this case, this was corrected by DR 460).

It means the alignment for aligned_alloc is implementation dependent.

Write a simple program to test aligned_alloc behavior in macOS and Linux (X86_64):

$ cat align.c
#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("alignof(max_align_t)=%zu\n\n", alignof(max_align_t));

    size_t size = 1024;
    size_t align[] = {1, 2, 4, 8, 16, 32, 64};
    for (size_t i = 0; i < sizeof(align) / sizeof(align[0]); i++)
    {
        void *p = aligned_alloc(align[i], size);
        printf("align=%zu, pointer is %p\n", align[i], p);
        free(p);
    }
}

Build and run it in macOS:

$ cc align.c -o align
$ ./align
alignof(max_align_t)=16

align=1, pointer is 0x0
align=2, pointer is 0x0
align=4, pointer is 0x0
align=8, pointer is 0x7fbd48801600
align=16, pointer is 0x7fbd48801600
align=32, pointer is 0x7fbd48801600
align=64, pointer is 0x7fbd48801600

In Linux (X86_64):

$ cc align.c -o align
$ ./align
alignof(max_align_t)=16

align=1, pointer is 0x5645aec676b0
align=2, pointer is 0x5645aec676b0
align=4, pointer is 0x5645aec676b0
align=8, pointer is 0x5645aec676b0
align=16, pointer is 0x5645aec676b0
align=32, pointer is 0x5645aec67ac0
align=64, pointer is 0x5645aec67f40

Both macOS and Linux (X86_64) have the same alignment of allocating memory from free storage: 16 bytes. macOS requires the alignment of aligned_alloc is at least 8 bytes; whilst Linux (X86_64) doesn’t have this requirement.

P.S., the code can be downloaded here.