I had a new requirement, i.e., duplicate the last packet in pcap file, and definitely, the timestamp of the packet needs to be modified. It is not hard, and just need to keep the previous packet information. Once meet the end of file, dump previous packet information with tweaked timestamp again. The code is available here.
Category: Technology
Enhance libunwind on illumos
In my last post, I mentioned that I used libunwind to debug a memory leak issue recently. I actually run this program on illumos
too, but unfortunately met following errors:
$ cat /tmp/backtrace.log
0x401b3b: -- error(unspecified (general) error): unable to obtain symbol name for this frame
0x401b47: -- error(unspecified (general) error): unable to obtain symbol name for this frame
0x401b5e: -- error(unspecified (general) error): unable to obtain symbol name for this frame
0x401757: -- error(unspecified (general) error): unable to obtain symbol name for this frame
0x4016b8: -- error(unspecified (general) error): unable to obtain symbol name for this frame
I used gdb
to do single-step debug, then found the libunwind illumos implementation just reuses the Linux
APIs:
......
#include "os-linux.h" // using linux header for map_iterator implementation
......
On Linux
, the map file is /proc/$pid/maps, but on illumos
, the file is /proc/$pid/map
. Hmm, the first step is wrong, then no need to progress further.
I tried to see what is in /proc/$pid/map
:
$ cat /proc/$$/map
cat: input error on /proc/511/map: Value too large for defined data type
cat
couldn’t help. Then resorted to vim
:
$ vim /proc/$$/map
^@^@@^@......
Just messy code. Now that it is not plain test, how to decrypt it? Aha, since pmap
can display it correctly:
$ pmap $$
511: -bash
0000000000400000 828K r-x-- /usr/bin/bash
00000000004DE000 20K rw--- /usr/bin/bash
00000000004E3000 60K rw--- /usr/bin/bash
0000000000F09000 1872K rw--- [ heap ]
FFFFFC7FEC110000 4K rwx-- [ anon ]
......
Let me check pmap implementation. After going through pmap
code, I found I should use libproc APIs to extract related information. I referred the code from pmap
and implemented a total new tdep_get_elf_image API, and it worked:
$ cat /tmp/backtrace.log
0x401b3b: (foo+0x9)
0x401b47: (bar+0x9)
0x401b5e: (main+0x14)
0x401757: (_start_crt+0x87)
0x4016b8: (_start+0x18)
I submitted a Pull Request as well, and hope it can be finally merged.
Use libunwind to debug memory leak issue
In our project, there is a shared object with a reference counter, which will be increased if others acquire it and decreased if released. Once the reference counter is 0
, the shared object can be reaped. Then we found the classical memory leak issue, i.e., the memory of shared object keeps growing. To debug this issue, I used libunwind.
The principle is simple: print the stack traces of every increment/decrement operations. I borrowed code from Programmatic access to the call stack in C++, and did some tweaks: mostly format the stack traces and output to file. The output is like this:
$ cat /tmp/backtrace.log
0x55ad59ec2556: (foo+0x9)
0x55ad59ec2562: (bar+0x9)
0x55ad59ec2579: (main+0x14)
0x7f941161ee0a: (__libc_start_main+0xea)
0x55ad59ec214a: (_start+0x2a)
A quick method to know the specific position in source code is through gdb
: attach the program, then use “info line
” command:
$ gdb program -p pid
......
(gdb) info line *0x55ad59ec2556
......
P.S., the code can be download here.
Use “LC_ALL=C” to improve performance
Using “LC_ALL=C
” can improve some program’s performance. The following is the test without LC_ALL=C
of join
program:
$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time join 1.sorted 2.sorted > 1-2.sorted.aggregated
real 0m49.903s
user 0m48.427s
sys 0m0.786s
And this one is using “LC_ALL=C
“:
$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time LC_ALL=C join 1.sorted 2.sorted > 1-2.sorted.aggregated
real 0m12.752s
user 0m5.628s
sys 0m0.971s
some good references about this topic are Speed up grep searches with LC_ALL=C and Everyone knows grep is faster in the C locale.
Clear file system cache before doing I/O-intensive benchmark on Linux
If you do any I/O-intensive benchmark, please run following command before it (Otherwise you may get wrong impression of the program performance):
$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
sync
means writing data from cache to file system (otherwise the dirty cache can’t be freed); “echo 3 > /proc/sys/vm/drop_caches
” will drop clean caches, as well as reclaimable slab objects. Check following command:
$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
$ time ./benchmark
real 0m12.434s
user 0m5.633s
sys 0m0.761s
$ time ./benchmark
real 0m6.291s
user 0m5.645s
sys 0m0.631s
the first run time of benchmark
program is ~12
seconds. Now that the files are cached, the second run time of benchmark
program is halved: ~6
seconds.
References:
Why drop caches in Linux?;
/proc/sys/vm;
CLEAR_CACHES.