The experience of fixing a memory corruption issue

I came across a program crash last week:

Program terminated with signal 11, Segmentation fault.
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007ffff606025c in memcpy (__len=<optimized out>, __src=0x0, __dest=0x0) at /usr/include/bits/string3.h:51
......
#5  0x0000000000000000 in ?? () 

The 5th stack frame address is 0x0000000000000000, and it seems not right. To debug it, get the registers values first:

According to X86_64 architecture, The value in memory address (%rbp) should be previous %rbp value, and the value in memory address (%rbp) + 8 should be return address. Checked these two values, and found they are all 0s, so it means the stack is corrupted.

The next thing to do is dump the memory between %rsp and %rbp, and refer the assembly code of the function at the same time. With this, I can know which memory part doesn’t seem correct, and review code accordingly. Finally I found the root cause and fixed it.

P.S., in optimisation build mode, some functions may be inlined, so please be aware of this caveat.

Fix “The language server crashed 5 times in the last 3 minutes.” error for VS Code C/C++ extension

From yesterday, My VS Code‘s C/C++ extension always reports “The language server crashed 5 times in the last 3 minutes.” error, and I have tried all the methods: Re-install extension, Re-start machine, re-isntall VS Code, etc. Unfortunatelly, no one works. Finally I install the VS Code - Insiders, then the problem is gone.

Convert pcapng file to pcap format

Today I met another error related to multiple interfaces in pcapng file:

pcap_next_ex() [an interface has a type 0 different from the type of the first interface]. Trace index = 8122

From my previous post, I wanted to convert it to pcap format by tshark command. But unfortunately it didn’t work:

$ tshark -F pcap -r a.pcapng -w b.pcap
tshark: Frame 8122 of file "a.pcapng" has a network type that differs from the network type of earlier packets, which isn't supported in a "Wireshark/tcpdump/... - pcap" file.

I had no choice but stole code from pcapplusplus to implement a converter, and the final code can be downloaded here. Please note on my macOS system, I did little tweak about PcapPlusPlus.mk:

$ cat /opt/homebrew/Cellar/pcapplusplus/22.11/etc/PcapPlusPlus.mk
......
# libs
PCAPPP_LIBS_DIR := -L/opt/homebrew/Cellar/pcapplusplus/22.11/lib
......

The pitfall of upgrading 3rd-party library

Today, I debugged a tricky issue, a bug related to a 3rd-party library. When I used gdb to check a structure’s values, found the last member was missed compared to the definitions in header file. I began to suspect this might be caused by 3rd-party library. I checked the upgrade log, then found the root cause: when I compiled the code, the 3rd-party library’s version is v1.1, but when I run the program, the library was upgraded to v1.2 by others, which caused this mysterious bug. The solution is simple: rebuild the code. But the debugging process is exhausting.

Bisection assert is a good debug methodology

Recently, I fixed an issue which is related to uninitialised bit-field in C programming language. Because the bit-filed can be either 0 or 1, so the bug will occur randomly. But the good news is the reproduced rate is very high, nearly 50%. Though I am not familiar with the code, I used bisection assert to help:

 {
  ......
  assert(bit-field == 0);
  ......
  assert(bit-field == 0);
  ......
 }

If the first assert is not triggered, but the second one is, I can know which code block has the bug, then bisect code and add assert again, until the root cause is found.