The caveat of thread name length in glibc

Recently, our team met an interesting bug: the process is configured to spawn 16 threads, but only spawns 10 threads in reality. The thread code is like this:

static void *
stat_consumer_thread_run(void *data)
{
    stat_consumer_thread_t *thread = data;
    char thread_name[64];
    snprintf(thread_name, sizeof(thread_name), "stat.consumer.%d",
        thread->id);
    int rc = pthread_setname_np(pthread_self(), thread_name);
    if (rc != 0) {
        return NULL;
    }

    ......
    return NULL;
}

After checking pthread_setname_np manual, we found:

The thread name is a meaningful C language string, whose length is restricted to 16 characters, including the terminating null byte (’\0’).

So thread name is restricted to 16 characters, “stat.consumer.0” ~ “stat.consumer.9” are set successfully, but “stat.consumer.10” ~ “stat.consumer.15” are not, and the corresponding threads are failed to run.

The experience of fixing a memory corruption issue

I came across a program crash last week:

Program terminated with signal 11, Segmentation fault.
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007ffff606025c in memcpy (__len=<optimized out>, __src=0x0, __dest=0x0) at /usr/include/bits/string3.h:51
......
#5  0x0000000000000000 in ?? () 

The 5th stack frame address is 0x0000000000000000, and it seems not right. To debug it, get the registers values first:

According to X86_64 architecture, The value in memory address (%rbp) should be previous %rbp value, and the value in memory address (%rbp) + 8 should be return address. Checked these two values, and found they are all 0s, so it means the stack is corrupted.

The next thing to do is dump the memory between %rsp and %rbp, and refer the assembly code of the function at the same time. With this, I can know which memory part doesn’t seem correct, and review code accordingly. Finally I found the root cause and fixed it.

P.S., in optimisation build mode, some functions may be inlined, so please be aware of this caveat.

Fix “The language server crashed 5 times in the last 3 minutes.” error for VS Code C/C++ extension

From yesterday, My VS Code‘s C/C++ extension always reports “The language server crashed 5 times in the last 3 minutes.” error, and I have tried all the methods: Re-install extension, Re-start machine, re-isntall VS Code, etc. Unfortunatelly, no one works. Finally I install the VS Code - Insiders, then the problem is gone.

Convert pcapng file to pcap format

Today I met another error related to multiple interfaces in pcapng file:

pcap_next_ex() [an interface has a type 0 different from the type of the first interface]. Trace index = 8122

From my previous post, I wanted to convert it to pcap format by tshark command. But unfortunately it didn’t work:

$ tshark -F pcap -r a.pcapng -w b.pcap
tshark: Frame 8122 of file "a.pcapng" has a network type that differs from the network type of earlier packets, which isn't supported in a "Wireshark/tcpdump/... - pcap" file.

I had no choice but stole code from pcapplusplus to implement a converter, and the final code can be downloaded here. Please note on my macOS system, I did little tweak about PcapPlusPlus.mk:

$ cat /opt/homebrew/Cellar/pcapplusplus/22.11/etc/PcapPlusPlus.mk
......
# libs
PCAPPP_LIBS_DIR := -L/opt/homebrew/Cellar/pcapplusplus/22.11/lib
......

The pitfall of upgrading 3rd-party library

Today, I debugged a tricky issue, a bug related to a 3rd-party library. When I used gdb to check a structure’s values, found the last member was missed compared to the definitions in header file. I began to suspect this might be caused by 3rd-party library. I checked the upgrade log, then found the root cause: when I compiled the code, the 3rd-party library’s version is v1.1, but when I run the program, the library was upgraded to v1.2 by others, which caused this mysterious bug. The solution is simple: rebuild the code. But the debugging process is exhausting.