The experience of fixing a memory corruption issue

I came across a program crash last week:

Program terminated with signal 11, Segmentation fault.
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#0  0x00007ffff365bd29 in __memcpy_ssse3_back () from /usr/lib64/libc.so.6
#1  0x00007ffff606025c in memcpy (__len=<optimized out>, __src=0x0, __dest=0x0) at /usr/include/bits/string3.h:51
......
#5  0x0000000000000000 in ?? () 

The 5th stack frame address is 0x0000000000000000, and it seems not right. To debug it, get the registers values first:

According to X86_64 architecture, The value in memory address (%rbp) should be previous %rbp value, and the value in memory address (%rbp) + 8 should be return address. Checked these two values, and found they are all 0s, so it means the stack is corrupted.

The next thing to do is dump the memory between %rsp and %rbp, and refer the assembly code of the function at the same time. With this, I can know which memory part doesn’t seem correct, and review code accordingly. Finally I found the root cause and fixed it.

P.S., in optimisation build mode, some functions may be inlined, so please be aware of this caveat.

The gotcha of logging gdb output

By default, gdb‘s output file is appended, not overwrote. E.g: debug the same program for 2 times:

$ gdb foo
......
(gdb) set logging on
Copying output to gdb.txt.
Copying debug output to gdb.txt.
(gdb) r
......
$ ll gdb.txt
-rw-rw-r-- 1 nanxiao nanxiao 1067 Jul  9 18:06 gdb.txt
$ gdb foo
......
(gdb) set logging on
Copying output to gdb.txt.
Copying debug output to gdb.txt.
(gdb) r
......
$ ll gdb.txt
-rw-rw-r-- 1 nanxiao nanxiao 2134 Jul  9 18:08 gdb.txt

After second debug, the gdb.txt‘s size is doubled. To overwrite the output file, execute set logging overwrite on before set logging on:

$ gdb foo
......
(gdb) set logging overwrite on
(gdb) set logging on
Copying output to gdb.txt.
Copying debug output to gdb.txt.
(gdb) r
......
$ ll gdb.txt
-rw-rw-r-- 1 nanxiao nanxiao 1067 Jul  9 18:10 gdb.txt

Run command when gdb breakpoint is hit

I need to analyse a large pcap file and find problematic packets, so I want gdb automatically outputs packet index when error occurs. Below are the gdb commands:

(gdb) b packet.c:1430
Breakpoint 2 at 0x7ffff0f910aa: file packet.c, line 1430.
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>silent
>frame 10
>printf "frame.number == %zu ||\n", packet_index
>c
>end
(gdb) r

Or you put them in a script:

b packet.c:1430
commands
 silent
 frame 10
 printf "frame.number == %zu ||\n", packet_index
 c
end

And it should be handy to log the output for checking later:

(gdb) set logging on

Use gdb’s convenience functions

Today I tried to set a conditional breakpoint in my program when a string variable is assigned one specific value:

b foo.c:488 if (int)strcmp(foo, "foo") == 0

But unfortunately, the gdb will exit with following error:

Unable to restore previously selected frame:
Selected thread is running.
terminate called after throwing an instance of 'gdb_exception_error'
Aborted

After checking in stackoverflow, I found gdb‘s convenience functions. So using $_streq instead of strcmp:

b foo.c:488 if $_streq(foo, "foo")

The gdb works like a charm!

Use libunwind to debug memory leak issue

In our project, there is a shared object with a reference counter, which will be increased if others acquire it and decreased if released. Once the reference counter is 0, the shared object can be reaped. Then we found the classical memory leak issue, i.e., the memory of shared object keeps growing. To debug this issue, I used libunwind.

The principle is simple: print the stack traces of every increment/decrement operations. I borrowed code from Programmatic access to the call stack in C++, and did some tweaks: mostly format the stack traces and output to file. The output is like this:

$ cat /tmp/backtrace.log
0x55ad59ec2556: (foo+0x9)
0x55ad59ec2562: (bar+0x9)
0x55ad59ec2579: (main+0x14)
0x7f941161ee0a: (__libc_start_main+0xea)
0x55ad59ec214a: (_start+0x2a)

A quick method to know the specific position in source code is through gdb: attach the program, then use “info line” command:

$ gdb program -p pid
......
(gdb) info line *0x55ad59ec2556
......

P.S., the code can be download here.