Cacheline-Orientated programming

From CPU’s perspective, the memory hierarchy is registers, L1 cache, L2 cache, L3 cache, main memory, among others. The smallest unit of cache is one cacheline, and it is 64 bytes in most cases:

$ getconf LEVEL1_DCACHE_LINESIZE
64

To make your applications run efficiently, you need to take cacheline into account. Take notorious cacheline fales sharing as an example:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[14];
    };
    .....

The size of struct Foo is 64 bytes, and it can be stored in one cacheline. If CPU 0 accesses Foo.a while CPU 1 accesses Foo.b at the same time, there will be “cacheline ping-ponging” between CPUs, and the performance will be downgraded drastically.

The other trick is to allocate memory cacheline size aligned. Still use above struct Foo as the example. To guarantee the whole struct Foo in one cacheline, posix_memalign can be used:

    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo));

The 64 is the alignment requirement.

Last but not least, sometimes padding is needed. E.g.:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
        int padding[2];
    };
    ......
    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo) * 10);

Or using compiler’s aligned attribute:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
    } __attribute__((aligned(64)));;
    ......

The original struct Foo‘s size is 56 bytes, after padding (or through compiler’s aligned attribure), it becomes 64 bytes, and can be loaded in one cacheline. Now we can allocate an array of struct Foo, and every CPU will process one element of the array, no “cacheline ping-ponging” will occur.

Be aware of AddressSanitizer’s shadow memory usage

The following is excerpted from AddressSanitizerAlgorithm:

AddressSanitizer maps 8 bytes of the application memory into 1 byte of the shadow memory.

It means if you use ASAN_POISON_MEMORY_REGION/ASAN_UNPOISON_MEMORY_REGION (please refer AddressSanitizerManualPoisoning), you should take shadow memory usage into account if application memory is huge. E.g., my application occupies ~216 GiB, the shadow memory will occupy about 216 / 8 = 27 GiB.

Fix “cannot find libasan_preinit.o” issue in Void Linux

In my Void Linux, using clang with -fsanitize=address option is OK:

$ clang -std=c11 -fsanitize=address test.c
$

Whilst gcc reports following error:

$ gcc -std=c11 -fsanitize=address test.c
/usr/bin/ld: cannot find libasan_preinit.o: No such file or directory
/usr/bin/ld: cannot find -lasan
collect2: error: ld returned 1 exit status

The solution is to install libsanitizer-devel:

$ sudo xbps-install -Su libsanitizer-devel
$ gcc -std=c11 -fsanitize=address test.c
$

Check the linked libraries of executalbe file:

$ ldd a.out
    linux-vdso.so.1 (0x00007fff1a58b000)
    libasan.so.5 => /usr/lib/libasan.so.5 (0x00007fc1f0f02000)
    libc.so.6 => /usr/lib/libc.so.6 (0x00007fc1f0d3f000)
    libdl.so.2 => /usr/lib/../lib/libdl.so.2 (0x00007fc1f0d3a000)
    librt.so.1 => /usr/lib/../lib/librt.so.1 (0x00007fc1f0d2f000)
    libpthread.so.0 => /usr/lib/../lib/libpthread.so.0 (0x00007fc1f0d0e000)
    libstdc++.so.6 => /usr/lib/../lib/libstdc++.so.6 (0x00007fc1f0a99000)
    libm.so.6 => /usr/lib/../lib/libm.so.6 (0x00007fc1f0952000)
    libgcc_s.so.1 => /usr/lib/../lib/libgcc_s.so.1 (0x00007fc1f0938000)
    /lib/ld-linux-x86-64.so.2 (0x00007fc1f193d000)

Reference:
How to use gcc with sanitizers in Void Linux?.

The alignment of dynamically allocating memory

Check Notes from max_align_t:

Pointers returned by allocation functions such as malloc are suitably aligned for any object, which means they are aligned at least as strictly as max_align_t.

It means the memory allocated dynamically is guaranteed to alignof(max_align_t) bytes aligned.

Check Notes from aligned_alloc:

Passing a size which is not an integral multiple of alignment or a alignment which is not valid or not supported by the implementation causes the function to fail and return a null pointer (C11, as published, specified undefined behavior in this case, this was corrected by DR 460).

It means the alignment for aligned_alloc is implementation dependent.

Write a simple program to test aligned_alloc behavior in macOS and Linux (X86_64):

$ cat align.c
#include <stdalign.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>

int main()
{
    printf("alignof(max_align_t)=%zu\n\n", alignof(max_align_t));

    size_t size = 1024;
    size_t align[] = {1, 2, 4, 8, 16, 32, 64};
    for (size_t i = 0; i < sizeof(align) / sizeof(align[0]); i++)
    {
        void *p = aligned_alloc(align[i], size);
        printf("align=%zu, pointer is %p\n", align[i], p);
        free(p);
    }
}

Build and run it in macOS:

$ cc align.c -o align
$ ./align
alignof(max_align_t)=16

align=1, pointer is 0x0
align=2, pointer is 0x0
align=4, pointer is 0x0
align=8, pointer is 0x7fbd48801600
align=16, pointer is 0x7fbd48801600
align=32, pointer is 0x7fbd48801600
align=64, pointer is 0x7fbd48801600

In Linux (X86_64):

$ cc align.c -o align
$ ./align
alignof(max_align_t)=16

align=1, pointer is 0x5645aec676b0
align=2, pointer is 0x5645aec676b0
align=4, pointer is 0x5645aec676b0
align=8, pointer is 0x5645aec676b0
align=16, pointer is 0x5645aec676b0
align=32, pointer is 0x5645aec67ac0
align=64, pointer is 0x5645aec67f40

Both macOS and Linux (X86_64) have the same alignment of allocating memory from free storage: 16 bytes. macOS requires the alignment of aligned_alloc is at least 8 bytes; whilst Linux (X86_64) doesn’t have this requirement.

P.S., the code can be downloaded here.

Fix “Permission denied” issue when installing manual in Void Linux

Today, I built and installed concurrencykit in Void Linux. When opening manual, it prompted following errors:

$ man ck_pr_barrier
man: /usr/local/share/man/man3/ck_pr_barrier.3.gz: Permission denied
man: outdated mandoc.db contains bogus man3/ck_pr_barrier.3.gz entry, run makewhatis /usr/local/share/man
man: outdated mandoc.db lacks ck_pr_barrier(3) entry, run makewhatis /usr/local/share/man
man: ERROR: /usr/local/share/man/man3/ck_pr_barrier.3.gz: Permission denied

Check the permission of manual file:

$ ll /usr/local/share/man/man3/ck_pr_barrier.3.gz
-rw------- 1 root root 1057 Dec 23 20:07 /usr/local/share/man/man3/ck_pr_barrier.3.gz

It indeed lacked read permission. Check the original file permission under my folder:

$ ll doc/ck_pr_barrier*
-rw-r--r-- 1 nan nan 2212 Dec 23 17:59 doc/ck_pr_barrier
-rw-r--r-- 1 nan nan 1057 Dec 23 20:07 doc/ck_pr_barrier.3.gz

It had read permission. The solution is adding following field in /etc/sudoers via the visudo command:

Defaults umask = 022

Reference:
Did sudo behaviour change recently?.