The experience of fixing a memory corruption issue

I came across a program crash last week:

Program terminated with signal 11, Segmentation fault.
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007ffff606025c in memcpy (__len=<optimized out>, __src=0x0, __dest=0x0) at /usr/include/bits/string3.h:51
......
#5  0x0000000000000000 in ?? () 

The 5th stack frame address is 0x0000000000000000, and it seems not right. To debug it, get the registers values first:

According to X86_64 architecture, The value in memory address (%rbp) should be previous %rbp value, and the value in memory address (%rbp) + 8 should be return address. Checked these two values, and found they are all 0s, so it means the stack is corrupted.

The next thing to do is dump the memory between %rsp and %rbp, and refer the assembly code of the function at the same time. With this, I can know which memory part doesn’t seem correct, and review code accordingly. Finally I found the root cause and fixed it.

P.S., in optimisation build mode, some functions may be inlined, so please be aware of this caveat.

The registers’ values in core dump file on x86_64

On x86_64 platforms, some registers are “caller-saved” whilst others are “callee-saved” (refer AMD64 Calling Conventions for Linux / Mac OSX), or from Optimizing subroutines in assembly language, section 4.1, Register usage, “Registers that
can be used freely” (“caller-saved”) and “Registers that must be saved and restored” (“callee-saved”). When using gdb to display registers values, the values are relative to the selected stack frame (Refer Registers):

Normally, register values are relative to the selected stack frame (see Selecting a Frame). This means that you get the value that the register would contain if all stack frames farther in were exited and their saved registers restored. In order to see the true contents of hardware registers, you must select the innermost frame (with ‘frame 0’).

……

Also, the more “outer” the frame is you’re looking at, the more likely a call-clobbered register’s value is to be wrong, in the sense that it doesn’t actually represent the value the register had just before the call.

So it means when using gdb to analyse core dump file, you must pay attention to the registers values since it may not reflect correct values of current stack frame. Check following diagram:

You can see only RSPRIP and “callee-saved” registers are different among frame 07 and 8.

Cacheline-Orientated programming

From CPU’s perspective, the memory hierarchy is registers, L1 cache, L2 cache, L3 cache, main memory, among others. The smallest unit of cache is one cacheline, and it is 64 bytes in most cases:

$ getconf LEVEL1_DCACHE_LINESIZE
64

To make your applications run efficiently, you need to take cacheline into account. Take notorious cacheline fales sharing as an example:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[14];
    };
    .....

The size of struct Foo is 64 bytes, and it can be stored in one cacheline. If CPU 0 accesses Foo.a while CPU 1 accesses Foo.b at the same time, there will be “cacheline ping-ponging” between CPUs, and the performance will be downgraded drastically.

The other trick is to allocate memory cacheline size aligned. Still use above struct Foo as the example. To guarantee the whole struct Foo in one cacheline, posix_memalign can be used:

    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo));

The 64 is the alignment requirement.

Last but not least, sometimes padding is needed. E.g.:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
        int padding[2];
    };
    ......
    struct Foo *foo;
    posix_memalign(&foo, 64, sizeof(struct Foo) * 10);

Or using compiler’s aligned attribute:

    ......
    struct Foo
    {
        int a;
        int b;
        int c[12];
    } __attribute__((aligned(64)));;
    ......

The original struct Foo‘s size is 56 bytes, after padding (or through compiler’s aligned attribure), it becomes 64 bytes, and can be loaded in one cacheline. Now we can allocate an array of struct Foo, and every CPU will process one element of the array, no “cacheline ping-ponging” will occur.

The “***Exception: Illegal” error of running googletest

Today, a college told me his project testcases (using googletest) run failed with following errors:

1/6 Test #1: AddTest ....................***Exception: Illegal  1.70 sec
......

I am not familiar with his work nor an expert of googletest, but interested about this exception. After some searching, I bumped into the words from this post:

Given that the GROMACS build system enabled AVX2 SIMD on your VM which seems to not support anything above SSE2, it’s not a surprise that the first math instruction crashes the run.

Immediate solution: set -DGMX_SIMD=SSE2 when configuring.

So my buddy seemed meet the similar problem. After discussing with him, he confirmed his server is also a virtual machine and his issue is the same root cause. To satisfy my own curiosity, I download tfhe whose testcases use AVX and FMA while my machine can only support SSE. Run “make test“:

# make test
Running tests...
Test project /root/Project/tfhe/build
    Start 1: unittests-nayuki-portable
1/4 Test #1: unittests-nayuki-portable ........   Passed    8.59 sec
    Start 2: unittests-nayuki-avx
2/4 Test #2: unittests-nayuki-avx .............***Exception: Illegal  2.88 sec
    Start 3: unittests-spqlios-avx
3/4 Test #3: unittests-spqlios-avx ............***Exception: Illegal  2.86 sec
    Start 4: unittests-spqlios-fma
4/4 Test #4: unittests-spqlios-fma ............***Exception: Illegal  2.85 sec

25% tests passed, 3 tests failed out of 4

Total Test time (real) =  17.19 sec

The following tests FAILED:
          2 - unittests-nayuki-avx (ILLEGAL)
          3 - unittests-spqlios-avx (ILLEGAL)
          4 - unittests-spqlios-fma (ILLEGAL)
Errors while running CTest
make: *** [Makefile:95: test] Error 8

We can see “Exception: Illegal” errors are reported again.