The core dump file in Arch Linux

Arch Linux uses systemd, and the core dump files will be stored in /var/lib/systemd/coredump directory by default.

Let’s look at the following simple program:

int main(void) {
        int *p = 0;
        *p = 1;
}

Compile and run it:

$ gcc -g -o test test.c
$ ./test
Segmentation fault (core dumped)

Definitely, the program crashes! Check the core dump files:

$ coredumpctl list
TIME                            PID   UID   GID SIG COREFILE EXE
......
Mon 2017-01-09 16:13:44 SGT    7307  1014  1014  11 present  /home/xiaonan/test
$ ls -alt /var/lib/systemd/coredump/core.test*
-rw-r-----+ 1 root root     22821 Jan  9 16:13 /var/lib/systemd/coredump/core.test.1014.183b4c57ccad464abd2eba2c104a47a8.7307.1483949624000000000000.lz4

According to the file name and time, we can find the program’s corresponding core dump binary. To debug it, we can use “coredumpctl gdb PID” command:

$ coredumpctl gdb 7307
       PID: 7307 (test)
       UID: 1014 (xiaonan)
       GID: 1014 (xiaonan)
    Signal: 11 (SEGV)
 Timestamp: Mon 2017-01-09 16:13:44 SGT (6min ago)
Command Line: ./test
Executable: /home/xiaonan/test
 Control Group: /user.slice/user-1014.slice/session-c4.scope
          Unit: session-c4.scope
         Slice: user-1014.slice
       Session: c4
     Owner UID: 1014 (xiaonan)
       Boot ID: 183b4c57ccad464abd2eba2c104a47a8
    Machine ID: 25671e5feadb4ae4bbe2c9ee6de97d66
      Hostname: supermicro-sys-1028gq-trt
       Storage: /var/lib/systemd/coredump/core.test.1014.183b4c57ccad464abd2eba2c104a47a8.7307.1483949624000000000000.lz4
       Message: Process 7307 (test) of user 1014 dumped core.

            Stack trace of thread 7307:
            #0  0x00000000004004b6 main (test)
            #1  0x00007f67ba722291 __libc_start_main (libc.so.6)
            #2  0x00000000004003da _start (test)

GNU gdb (GDB) 7.12
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/xiaonan/test...done.
[New LWP 7307]
Core was generated by `./test'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00000000004004b6 in main () at test.c:3
3               *p = 1;
(gdb) bt
#0  0x00000000004004b6 in main () at test.c:3

BTW, if you omit PID, “coredumpctl gdb” will launch the newest core dump file.

Attention should be paid. The core dump size is limited to 2GiB, so if it is too large, it will be truncated, and generate the warning like this (Please refer this bug):

......
BFD: Warning: /var/tmp/coredump-ZrhAM4 is truncated: expected core file size >= 2591764480, found: 2147483648.
......

References:
GDB and trouble with core dumps;
Core dump.

A brief intro of delve

delve is a debugger developed in Golang and also dedicated to help trouble-shooting Golang programs (Home page is here). Though it is still in pre-1.0 release, I think it is stable enough for daily use. BTW, if you find some bugs, you can report it to developers and help to make delve more stronger! P.S., Albeit fmt.Print buddies are useful in most cases, I strongly recommend you try to usedelve to inspect the internal mechanism of your code, because it can help you know Golang deeper, not just superficial stuff.

Installing delve is very simple: taking Linux platform as an example, it is no different from setting up other Golang projects, just “go get” is enough:

go get github.com/derekparker/delve/cmd/dlv 

Now, in $GOPATH/bin, there is an extra dlv executable binary file (Notice: the project is named delve, while the executable file is calleddlv. I even made a foolish mistake when began to use it).Run dlv command, and it will show you a detailed manual of delve:

# dlv
Delve is a source level debugger for Go programs.

Delve enables you to interact with your program by controlling the execution of the process,
evaluating variables, and providing information of thread / goroutine state, CPU register state and more.

The goal of this tool is to provide a simple yet powerful interface for debugging Go programs.

Usage:
  dlv [command]

Available Commands:
  version     Prints version.
  run         Deprecated command. Use 'debug' instead.
  debug       Compile and begin debugging program.
......

Let’s check this artificial Hello.go program:

package main

import "fmt"

func main() {
        var s []byte
        s = append(s, []byte("Hello, Debugging!")...)
        fmt.Println(string(s))
}

Use delve to debug it:

# dlv debug Hello.go
Type 'help' for list of commands.
(dlv) help
The following commands are available:
    help (alias: h) ------------- Prints the help message.
    break (alias: b) ------------ Sets a breakpoint.
    trace (alias: t) ------------ Set tracepoint.
    restart (alias: r) ---------- Restart process.
    continue (alias: c) --------- Run until breakpoint or program termination.
    step (alias: s) ------------- Single step through program.
    step-instruction (alias: si)  Single step a single cpu instruction.
    next (alias: n) ------------- Step over to next source line.
    threads --------------------- Print out info for every traced thread.
    thread (alias: tr) ---------- Switch to the specified thread.
......

If you are familiar with gdb, you will find the commands are very similar, and I promise you can master delve soon.

An interesting thing is that the delve doesn’t provide start command which gdb offers, so you should try to set breakpoints first, then run continue command:

(dlv) b Hello.go:8
Breakpoint 1 set at 0x4011ea for main.main() ./Hello.go:8
(dlv) c
> main.main() ./Hello.go:8 (hits goroutine(1):1 total:1) (PC: 0x4011ea)
     3: import "fmt"
     4:
     5: func main() {
     6:         var s []byte
     7:         s = append(s, []byte("Hello, Debugging!")...)
=>   8:         fmt.Println(string(s))
     9: }
(dlv) p s
[]uint8 len: 17, cap: 32, [72,101,108,108,111,44,32,68,101,98,117,103,103,105,110,103,33]
(dlv) goroutines
[4 goroutines]
* Goroutine 1 - User: ./Hello.go:8 main.main (0x4011ea)
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 4 - User: /usr/local/go/src/runtime/mfinal.go:144 runtime.runfinq (0x413f80)

Cool! Isn’t it? Now You can observe almost everything you want to know about your program.

Happy Debugging! Happy delving!

 

The tips of debugging Mesos

In the past week, I was following this tutorial to build a “kubernetes on Mesos” testbed. All went well but the Mesos master always complains following words:

......
E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
......

Firstly, I tried to get help from Mesos mailing list and stackoverflow, but after other friends can’t give correct answers directly, I knew I must depend on myself. Enduring a tough debugging process, I worked out the root cause. Since I am a newbie of Mesos and C++(Mesos is implemented in C++, and I last time touch C++ was 7 years ago), I think the experiences and tips may also be useful for other novices. So I summarize them as the following words:

(1) LOG VS VLOG

When you meet an issue, analyzing log should be the first step. Mesos utilizes the google-glog to generate the logs. And the log format explanation is here:

Log lines have this form:
    Lmmdd hh:mm:ss.uuuuuu threadid file:line] msg...
where the fields are defined as follows:
    L                A single character, representing the log level (eg 'I' for INFO)
    mm               The month (zero padded; ie May is '05')
    dd               The day (zero padded)
    hh:mm:ss.uuuuuu  Time in hours, minutes and fractional seconds
    threadid         The space-padded thread ID as returned by GetTID()
    file             The file name
    line             The line number
    msg              The user-supplied message

So compared with the above words, you can easily understand this log:

E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected

By default, the Mesos doesn’t output logs generated by VLOG function, and you need to set GLOG_v=m if you want to see the information from VLOG function (Refer this post):

$ sudo GLOG_v=3 ./bin/mesos-master.sh --ip=15.242.100.56 --work_dir=/var/lib/mesos
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1229 22:42:38.818521 11830 process.cpp:2426] Spawned process __gc__@15.242.100.56:5050
I1229 22:42:38.818613 11846 process.cpp:2436] Resuming __gc__@15.242.100.56:5050 at 2015-12-30 03:42:38.818540032+00:00
I1229 22:42:38.818749 11847 process.cpp:2436] Resuming __gc__@15.242.100.56:5050 at 2015-12-30 03:42:38.818712832+00:00
I1229 22:42:38.818802 11844 process.cpp:2436] Resuming help@15.242.100.56:5050 at 2015-12-30 03:42:38.818746112+00:00
......

You can also use LOG function to add logs yourself on suspected locations.

(2) Gdb

If logs can’t save you, it is time for debugger to be your hero. Gdb is no doubt a great tool of debugging C/C++ programs. To use gdb, you should enable --enable-debug configuration option before compiling Mesos:

nan@ubuntu:~/mesos-0.25.0/build$ ../configure --enable-debug

You can set breakpoint on class member function like this:

(gdb) b process::SocketManager::close(int)
Breakpoint 1 at 0x7fd07857c162: file ../../../3rdparty/libprocess/src/process.cpp, line 1849.

You can also make use of “auto-complete” feature of gdb. Input the uncompleted function name:

(gdb) b process::SocketManager::cl  

Then click tab:

(gdb) b process::SocketManager::close(int)
Breakpoint 1 at 0x7fd07857c162: file ../../../3rdparty/libprocess/src/process.cpp, line 1849.

Notice: If the matched symbols are too many, it may hang gdb. So try to reduce the scope as small as possible.

Additionally, since source file names are relative to the directory where the code was compiled (please refer breakpoints in GDB),you can reach the same effect through “b file:line“command :

(gdb) b ../../../3rdparty/libprocess/src/process.cpp:1279
Breakpoint 2 at 0x7fd07857973a: file ../../../3rdparty/libprocess/src/process.cpp, line 1279.
(gdb) c
Continuing.
......
[Switching to Thread 0x7fd06b9d4700 (LWP 16677)]

Breakpoint 2, process::SocketManager::link_connect (this=0xca1a30, future=..., socket=0x7fd0500026d0, to=...)
    at ../../../3rdparty/libprocess/src/process.cpp:1279
1279      if (future.isDiscarded() || future.isFailed()) {

You can see the breakpoint is set onprocess::SocketManager::link_connect(process::Future<Nothing> const&, process::network::Socket*, process::UPID const&)function.

P.S.:There are also handy out-of-box gdb scripts in build/bin directory:

# ls bin/gdb-mesos-*
bin/gdb-mesos-local.sh  bin/gdb-mesos-master.sh  bin/gdb-mesos-slave.sh  bin/gdb-mesos-tests.sh

(3) Tcpdump and wireshark

Network packet analyzing tools such as tcpdump and wireshark are essential to diagnose programs which interact with other hosts. E.g., you can use following command to see what come in and out of Mesos master:

sudo tcpdump -A -s 0 'tcp port 5050' -i em1 -w capture.pcap

BTW, my issue is finally fixed by analyzing the following packet:

1

(4) Pstack script

Personally, I think pstack script is useful when monitoring thread status, and please refer Use pstack to track threads on Linux for detail.

Enjoy debugging!

 

The tips of using gdb to debug Golang program

Although “GDB does not understand Go programs well.” (from Debugging Go Code with GDB), sometimes gdb is still a useful tool for debugging Golang program. In this post, I will show some small tips.

(1)

My OS is Ubuntu 14.04. Launching gdb, it prompts following things:

......
warning: File "/usr/local/go/src/runtime/runtime-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py
line to your configuration file "/home/nan/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/nan/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"
......

So I add “add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py” in my .gdbinit file. My preferred .gdbinit file is like this:

$ cat ${HOME}/.gdbinit
add-auto-load-safe-path /usr/local/go/src/runtime/runtime-gdb.py
set confirm off
set print pretty on

(2)

If you want to set breakpoint on the start of main function, you should use “b main.main“:

......
(gdb) b main.main
Breakpoint 1 at 0x4021e0: file /home/nan/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubectl/kubectl.go, line 26.
(gdb) r
Starting program: /home/nan/kubernetes/_output/local/go/bin/kubectl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77f6700 (LWP 957)]
[New Thread 0x7ffff6fb5700 (LWP 958)]
[New Thread 0x7ffff5fb3700 (LWP 960)]
[New Thread 0x7ffff67b4700 (LWP 959)]

Breakpoint 1, main.main () at /home/nan/kubernetes/_output/local/go/src/k8s.io/kubernetes/cmd/kubectl/kubectl.go:26
26      func main() {
......

Not “b main“, except you want to read mysterious assembly code:

......
(gdb) b main
Breakpoint 1 at 0x45e830: file /usr/local/go/src/runtime/rt0_linux_amd64.s, line 63.
(gdb) r
Starting program: /home/nan/kubernetes/_output/local/go/bin/kubectl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Breakpoint 1, main () at /usr/local/go/src/runtime/rt0_linux_amd64.s:63
63              MOVQ    $runtime·rt0_go(SB), AX
(gdb)
......

(3)

If you don’t know the symbol name of a function, you can use readelf tool. E.g.:

$ readelf -s -W /home/nan/kubernetes/_output/local/go/bin/kubectl | grep NewCmdConfig
 ......
 14350: 0000000000766da0  3168 FUNC    LOCAL  DEFAULT   14 k8s.io/kubernetes/pkg/kubectl/cmd/config.NewCmdConfig
 ......
 14404: 00000000007745d0    48 FUNC    LOCAL  DEFAULT   14 k8s.io/kubernetes/pkg/kubectl/cmd/config.NewCmdConfig.func1

The config.NewCmdConfig.func1 is the function which is defined in NewCmdConfig function:

func NewCmdConfig(pathOptions *PathOptions, out io.Writer) *cobra.Command {
    ......
        Run: func(cmd *cobra.Command, args []string) {
            cmd.Help()
    ......
}

Then you can set breakpoint for wanted function:

......
(gdb) b k8s.io/kubernetes/pkg/kubectl/cmd/config.NewCmdConfig
Breakpoint 1 at 0x766da0: file /home/nan/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubectl/cmd/config/config.go, line 63.
(gdb) r
Starting program: /home/nan/kubernetes/_output/local/go/bin/kubectl
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff77f6700 (LWP 1416)]
[New Thread 0x7ffff6fb5700 (LWP 1417)]
[New Thread 0x7ffff67b4700 (LWP 1418)]
[New Thread 0x7ffff5fb3700 (LWP 1419)]

Breakpoint 1, k8s.io/kubernetes/pkg/kubectl/cmd/config.NewCmdConfig (pathOptions=0xc820178b90, out=..., ~r2=0x1)
    at /home/nan/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/kubectl/cmd/config/config.go:63
63      func NewCmdConfig(pathOptions *PathOptions, out io.Writer) *cobra.Command {
(gdb)
......

 

Use pstack to track threads on Linux

RedHat Linux distros provide a pstack script which can track process’s threads, and the script is like this:

#!/bin/bash

if test $# -ne 1; then
    echo "Usage: `basename $0 .sh` <process-id>" 1>&2
    exit 1
fi

if test ! -r /proc/$1; then
    echo "Process $1 not found." 1>&2
    exit 1
fi

# GDB doesn't allow "thread apply all bt" when the process isn't
# threaded; need to peek at the process to determine if that or the
# simpler "bt" should be used.

backtrace="bt"
if test -d /proc/$1/task ; then
    # Newer kernel; has a task/ directory.
    if test `/bin/ls /proc/$1/task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
        backtrace="thread apply all bt"
    fi
elif test -f /proc/$1/maps ; then
    # Older kernel; go by it loading libpthread.
    if /bin/grep -e libpthread /proc/$1/maps > /dev/null 2>&1 ; then
        backtrace="thread apply all bt"
    fi
fi

GDB=${GDB:-/usr/bin/gdb}

if $GDB -nx --quiet --batch --readnever > /dev/null 2>&1; then
    readnever=--readnever
else
    readnever=
fi

# Run GDB, strip out unwanted noise.
$GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 |
$backtrace
EOF
/bin/sed -n \
    -e 's/^(gdb) //' \
    -e '/^#/p' \
    -e '/^Thread/p'

Copy it to Suse, and use it (pstack process_ID):

linux-uibj:/usr/bin # pstack 1487
Thread 2 (Thread 0x7eff7ce91700 (LWP 1489)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x00007eff7f76a6b6 in gdbus_shared_thread_func ()
#4  0x00007eff7efaae15 in g_thread_proxy () from /usr/lib64/libglib-2.0.so.0
#5  0x00007eff7ed260a4 in start_thread () from /lib64/libpthread.so.0
#6  0x00007eff7ea5b7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff7fbfd800 (LWP 1487)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x000000000040ab08 in ?? ()
#4  0x00007eff7e997b05 in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000405bd6 in ?? ()

How can we resolve ??()? We can utilize gdb command: “gdb --quiet -nx --readnever /proc/$pid/exe $pid“, Take above process ID (1487) as an example:

linux-uibj:/usr/bin # gdb --quiet -nx --readnever /proc/1487/exe 1487
Reading symbols from /proc/1487/exe...(no debugging symbols found)...done.
......
Missing separate debuginfos, use: zypper install gvfs-backends-debuginfo-1.18.3-3.28.x86_64 libgudev-1_0-0-debuginfo-210-44.1.x86_64
......

The gdb will prompt which debuginfo packages are lacked, then install them:

linux-uibj:/usr/bin # zypper install gvfs-backends-debuginfo-1.18.3-3.28.x86_64 libgudev-1_0-0-debuginfo-210-44.1.x86_64

Execute “pstack 1487” again:

linux-uibj:/usr/bin # pstack 1487
Thread 2 (Thread 0x7eff7ce91700 (LWP 1489)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x00007eff7f76a6b6 in gdbus_shared_thread_func ()
#4  0x00007eff7efaae15 in g_thread_proxy () from /usr/lib64/libglib-2.0.so.0
#5  0x00007eff7ed260a4 in start_thread () from /lib64/libpthread.so.0
#6  0x00007eff7ea5b7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff7fbfd800 (LWP 1487)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x000000000040ab08 in g_vfs_proxy_volume_monitor_daemon_main ()
#4  0x00007eff7e997b05 in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000405bd6 in _start ()

Now all symbols are resolved.

P.S. You should execute the script in root privilege. E.g., modify the script as:

......
sudo $GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 |
......

Reference:
How to resolve function name through memory address?