__COUNTER__ macro in gcc/clang

Both gcc and clang define __COUNTER__ marco:

Defined to an integer value that starts at zero and is incremented each time the COUNTER macro is expanded.

Check following code:

# cat foo.c
#include <stdio.h>

void foo1(void)
{
    printf("%s:%d\n", __func__, __COUNTER__);
}

void foo2(void)
{
    printf("%s:%d\n", __func__, __COUNTER__);
}
# cat bar.c
#include <stdio.h>

void bar1(void)
{
    printf("%s:%d\n", __func__, __COUNTER__);
}

void bar2(void)
{
    printf("%s:%d\n", __func__, __COUNTER__);
}
# cat main.c
#include "foo.h"
#include "bar.h"

int main(void)
{
    foo1();
    foo2();
    bar1();
    bar2();
    return 0;
}

Run the program:

# ./main
foo_1:0
foo_2:1
bar_1:0
bar_2:1

You can see for every translate unit (.c) file, the __COUNTER__ begins at 0.

P.S., the code can be referenced here.

Cacheline-Orientated programming

From CPU’s perspective, the memory hierarchy is registers, L1 cache, L2 cache, L3 cache, main memory, among others. The smallest unit of cache is one cacheline, and it is 64 bytes in most cases:

$ getconf LEVEL1_DCACHE_LINESIZE
64

To make your applications run efficiently, you need to take cacheline into account. Take notorious cacheline fales sharing as an example:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[14];
    };
    .....

The size of struct Foo is 64 bytes, and it can be stored in one cacheline. If CPU 0 accesses Foo.a while CPU 1 accesses Foo.b at the same time, there will be “cacheline ping-ponging” between CPUs, and the performance will be downgraded drastically.

The other trick is to allocate memory cacheline size aligned. Still use above struct Foo as the example. To guarantee the whole struct Foo in one cacheline, posix_memalign can be used:

    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo));

The 64 is the alignment requirement.

Last but not least, sometimes padding is needed. E.g.:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
        int padding[2];
    };
    ......
    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo) * 10);

Or using compiler’s aligned attribute:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
    } __attribute__((aligned(64)));;
    ......

The original struct Foo‘s size is 56 bytes, after padding (or through compiler’s aligned attribure), it becomes 64 bytes, and can be loaded in one cacheline. Now we can allocate an array of struct Foo, and every CPU will process one element of the array, no “cacheline ping-ponging” will occur.

Be aware of AddressSanitizer’s shadow memory usage

The following is excerpted from AddressSanitizerAlgorithm:

AddressSanitizer maps 8 bytes of the application memory into 1 byte of the shadow memory.

It means if you use ASAN_POISON_MEMORY_REGION/ASAN_UNPOISON_MEMORY_REGION (please refer AddressSanitizerManualPoisoning), you should take shadow memory usage into account if application memory is huge. E.g., my application occupies ~216 GiB, the shadow memory will occupy about 216 / 8 = 27 GiB.

Fix “cannot find libasan_preinit.o” issue in Void Linux

In my Void Linux, using clang with -fsanitize=address option is OK:

$ clang -std=c11 -fsanitize=address test.c
$

Whilst gcc reports following error:

$ gcc -std=c11 -fsanitize=address test.c
/usr/bin/ld: cannot find libasan_preinit.o: No such file or directory
/usr/bin/ld: cannot find -lasan
collect2: error: ld returned 1 exit status

The solution is to install libsanitizer-devel:

$ sudo xbps-install -Su libsanitizer-devel
$ gcc -std=c11 -fsanitize=address test.c
$

Check the linked libraries of executalbe file:

$ ldd a.out
    linux-vdso.so.1 (0x00007fff1a58b000)
    libasan.so.5 => /usr/lib/libasan.so.5 (0x00007fc1f0f02000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007fc1f0d3f000)
    libdl.so.2 => /usr/lib/../lib/libdl.so.2 (0x00007fc1f0d3a000)
    librt.so.1 => /usr/lib/../lib/librt.so.1 (0x00007fc1f0d2f000)
    libpthread.so.0 => /usr/lib/../lib/libpthread.so.0 (0x00007fc1f0d0e000)
    libstdc++.so.6 => /usr/lib/../lib/libstdc++.so.6 (0x00007fc1f0a99000)
    libm.so.6 => /usr/lib/../lib/libm.so.6 (0x00007fc1f0952000)
    libgcc_s.so.1 => /usr/lib/../lib/libgcc_s.so.1 (0x00007fc1f0938000)
    /lib/ld-linux-x86-64.so.2 (0x00007fc1f193d000)

Reference:
How to use gcc with sanitizers in Void Linux?.

Empty struct in C

Empty struct in C is undefined behaviour (refer C17 spec, section 6.7.2.1):

If the struct-declaration-list does not contain any named members, either directly or via an anonymous structure or anonymous union, the behavior is undefined.

On my Linux, I use both gcc (v9.1.0) and clang (v8.0.0) to compile following code:

$ cat test.c
#include <stdio.h>

typedef struct A {} A;

int main()
{
    printf("%zu\n", sizeof(A));
}

No warnings, and both ouput:
0

Reference:
C empty struct — what does this mean/do?.