Install right “kernel-debuginfo” package on RHEL

You need to install right “kernel-debuginfo” package on RHEL. E.g.:

# rpm -q kernel
kernel-3.10.0-229.el7.x86_64

# uname -r
3.10.0-229.el7.x86_64

# rpm -q kernel-debuginfo
package kernel-debuginfo is not installed

# rpm -ivh kernel-debuginfo-common-x86_64-3.10.0-229.el7.x86_64.rpm

# rpm -ivh kernel-debuginfo-3.10.0-229.el7.x86_64.rpm

# rpm -q kernel-debuginfo
kernel-debuginfo-3.10.0-229.el7.x86_64

If kernel is debug version, you also need to install kernel-debug-debuginfo package:

# uname -r
3.10.0-229.el7.x86_64.debug

# rpm -ivh kernel-debuginfo-common-x86_64-3.10.0-229.el7.x86_64.rpm

# rpm -ivh kernel-debug-debuginfo-3.10.0-229.el7.x86_64.rpm

Reference:

[Crash-utility] How does crash find booted kernel?.

 

Use pstack to track threads on Linux

RedHat Linux distros provide a pstack script which can track process’s threads, and the script is like this:

#!/bin/bash

if test $# -ne 1; then
    echo "Usage: `basename $0 .sh` <process-id>" 1>&2
    exit 1
fi

if test ! -r /proc/$1; then
    echo "Process $1 not found." 1>&2
    exit 1
fi

# GDB doesn't allow "thread apply all bt" when the process isn't
# threaded; need to peek at the process to determine if that or the
# simpler "bt" should be used.

backtrace="bt"
if test -d /proc/$1/task ; then
    # Newer kernel; has a task/ directory.
    if test `/bin/ls /proc/$1/task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
        backtrace="thread apply all bt"
    fi
elif test -f /proc/$1/maps ; then
    # Older kernel; go by it loading libpthread.
    if /bin/grep -e libpthread /proc/$1/maps > /dev/null 2>&1 ; then
        backtrace="thread apply all bt"
    fi
fi

GDB=${GDB:-/usr/bin/gdb}

if $GDB -nx --quiet --batch --readnever > /dev/null 2>&1; then
    readnever=--readnever
else
    readnever=
fi

# Run GDB, strip out unwanted noise.
$GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 |
$backtrace
EOF
/bin/sed -n \
    -e 's/^(gdb) //' \
    -e '/^#/p' \
    -e '/^Thread/p'

Copy it to Suse, and use it (pstack process_ID):

linux-uibj:/usr/bin # pstack 1487
Thread 2 (Thread 0x7eff7ce91700 (LWP 1489)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x00007eff7f76a6b6 in gdbus_shared_thread_func ()
#4  0x00007eff7efaae15 in g_thread_proxy () from /usr/lib64/libglib-2.0.so.0
#5  0x00007eff7ed260a4 in start_thread () from /lib64/libpthread.so.0
#6  0x00007eff7ea5b7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff7fbfd800 (LWP 1487)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x000000000040ab08 in ?? ()
#4  0x00007eff7e997b05 in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000405bd6 in ?? ()

How can we resolve ??()? We can utilize gdb command: “gdb --quiet -nx --readnever /proc/$pid/exe $pid“, Take above process ID (1487) as an example:

linux-uibj:/usr/bin # gdb --quiet -nx --readnever /proc/1487/exe 1487
Reading symbols from /proc/1487/exe...(no debugging symbols found)...done.
......
Missing separate debuginfos, use: zypper install gvfs-backends-debuginfo-1.18.3-3.28.x86_64 libgudev-1_0-0-debuginfo-210-44.1.x86_64
......

The gdb will prompt which debuginfo packages are lacked, then install them:

linux-uibj:/usr/bin # zypper install gvfs-backends-debuginfo-1.18.3-3.28.x86_64 libgudev-1_0-0-debuginfo-210-44.1.x86_64

Execute “pstack 1487” again:

linux-uibj:/usr/bin # pstack 1487
Thread 2 (Thread 0x7eff7ce91700 (LWP 1489)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x00007eff7f76a6b6 in gdbus_shared_thread_func ()
#4  0x00007eff7efaae15 in g_thread_proxy () from /usr/lib64/libglib-2.0.so.0
#5  0x00007eff7ed260a4 in start_thread () from /lib64/libpthread.so.0
#6  0x00007eff7ea5b7fd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7eff7fbfd800 (LWP 1487)):
#0  0x00007eff7ea533cd in poll () from /lib64/libc.so.6
#1  0x00007eff7ef86454 in g_main_context_iterate.isra ()
#2  0x00007eff7ef868ba in g_main_loop_run () from /usr/lib64/libglib-2.0.so.0
#3  0x000000000040ab08 in g_vfs_proxy_volume_monitor_daemon_main ()
#4  0x00007eff7e997b05 in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000405bd6 in _start ()

Now all symbols are resolved.

P.S. You should execute the script in root privilege. E.g., modify the script as:

......
sudo $GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 |
......

Reference:
How to resolve function name through memory address?

 

A brief introduction of zypper

After installing SUSE (my version is SLES 12), the CD/DVD will be added as a default repository:

linux-uibj:~ # zypper repos -d
# | Alias       | Name        | Enabled | Refresh | Priority | Type  | URI                                                         | Service
--+-------------+-------------+---------+---------+----------+-------+-------------------------------------------------------------+--------
1 | SLES12-12-0 | SLES12-12-0 | Yes     | No      |   99     | yast2 | cd:///?devices=/dev/disk/by-id/ata-VBOX_CD-ROM_VB2-01700376 |       

So after you insert the installation ISO file into CDROM, use zypper in command can install the software:

linux-uibj:~ # zypper se systemtap
Loading repository data...
Reading installed packages...

S | Name              | Summary                              | Type
--+-------------------+--------------------------------------+-----------
  | systemtap         | Instrumentation System               | package
  | systemtap         | Instrumentation System               | srcpackage
  | systemtap-docs    | Documents and examples for systemtap | package
  | systemtap-docs    | Documents and examples for systemtap | srcpackage
  | systemtap-runtime | Runtime environment for systemtap    | package
  | systemtap-server  | Systemtap server                     | package
linux-uibj:~ # zypper in systemtap
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 NEW packages are going to be installed:
  libebl1 systemtap systemtap-runtime

3 new packages to install.
Overall download size: 1.4 MiB. Already cached: 0 B  After the operation, additional 5.7 MiB will be used.
Continue? [y/n/? shows all options] (y):

After removing the repository, even the ISO file is still in CDROM, the zypper in command doesn’t work:

linux-uibj:~ # zypper rr 1
Removing repository 'SLES12-12-0' ...................................................................................................[done]
Repository 'SLES12-12-0' has been removed.
linux-uibj:~ # zypper in systemtap
Warning: No repositories defined. Operating only with the installed resolvables. Nothing can be installed.

You can also add repository’s URL:

linux-uibj:~ # zypper ar http://xxx.net/mrepo/SLE-12-Server-x86_64/disc1/ SLES12-1
Adding repository 'SLES12-1' ........................................................................................................[done]
Repository 'SLES12-1' successfully added
Enabled: Yes
Autorefresh: No
GPG check: Yes
URI: http://xxx.net/mrepo/SLE-12-Server-x86_64/disc1/

linux-uibj:~ # zypper in systemtap
Building repository 'SLES12-1' cache ................................................................................................[done]
Loading repository data...
Reading installed packages...
Resolving package dependencies...

The following 3 NEW packages are going to be installed:
  libebl1 systemtap systemtap-runtime

3 new packages to install.
Overall download size: 1.4 MiB. Already cached: 0 B  After the operation, additional 5.7 MiB will be used.
Continue? [y/n/? shows all options] (y):

Many *-devel packages are in SDK ISO file, so you should also add the SDK ISO into repository.

 

Use SystemTap to track forking process

The SystemTap website provides a forktracker.stp script to track the forking process flow, and the original script is like this (P.S.: now, the script has been modified):

probe kprocess.create
{
  printf("%-25s: %s (%d) created %d\n",
         ctime(gettimeofday_s()), execname(), pid(), new_pid)
}

probe kprocess.exec
{
  printf("%-25s: %s (%d) is exec'ing %s\n",
         ctime(gettimeofday_s()), execname(), pid(), filename)
}

After executing it, the output confused me:

......
Thu Oct 22 05:09:42 2015 : virt-manager (8713) created 8713
Thu Oct 22 05:09:42 2015 : virt-manager (8713) created 8713
Thu Oct 22 05:09:42 2015 : virt-manager (8713) created 8713
Thu Oct 22 05:09:43 2015 : virt-manager (8713) created 8713
......

Why the father and children processes had the same process ID: 8713. At first, I thought it was because the speciality of fork: call once, return twice. So I wrote a simple program to test whether it was due to fork:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main(void)
{
    pid_t pid;

    pid = fork();
    if (pid < 0) {
        exit(1);
    } else if (pid > 0) {
        printf("Parent exits!\n");
        exit(0);
    }

    printf("hello world\n");
    return 0;
}   

This time, the script outputed the following:

......
Thu Oct 22 05:27:10 2015 : bash (3855) created 8955
Thu Oct 22 05:27:10 2015 : bash (8955) is exec'ing "./test"
Thu Oct 22 05:27:10 2015 : test (8955) created 8956
......

The father and child had different process IDs, so it is not fork system call’s fault.

After resorting to SystemTap mailing list, Josh Stone gave me the answer, and it is related to Linux thread implementation: In Linux, the thread is actually also a process, so for a multi-thread program in Linux, you can think it as a thread-group. The whole thread-group has a thread-group-id(In SystemTap, pid() and new_pid‘ value), and every thread has a unique ID (In SystemTap, tid() and new_tid‘ value). Josh Stone also modified the script like this:

probe kprocess.create {
  printf("%-25s: %s (%d:%d) created %d:%d\n",
         ctime(gettimeofday_s()), execname(), pid(), tid(), new_pid, new_tid)
}

probe kprocess.exec {
  printf("%-25s: %s (%d) is exec'ing %s\n",
         ctime(gettimeofday_s()), execname(), pid(), filename)
}

To verify it, I wrote a multi-thread program:

#include <stdio.h>
#include <pthread.h>
void *thread_func(void *p_arg)
{
        while (1)
        {
                printf("%s\n", (char*)p_arg);
                sleep(10);
        }
}
int main(void)
{
        pthread_t t1, t2;

        pthread_create(&t1, NULL, thread_func, "Thread 1");
        pthread_create(&t2, NULL, thread_func, "Thread 2");

        sleep(1000);
        return;
}

The script output was like this:

......
Sat Oct 24 10:56:35 2015 : bash (889) is exec'ing "./test"
Sat Oct 24 10:56:35 2015 : test (889:889) created 889:890
Sat Oct 24 10:56:35 2015 : test (889:889) created 889:891
......

From the output, we can see: the main thread had the same tid() and pid() value: 889. All the threads had the same pid: 889, but every thread had unique tid values: 889, 890, 891.

Reference:
How to understand the pid() and new_pid are same value in executing forktracker.stp?.

 

Install SystemTap on Suse

The Suse is SLES(Suse Linux Enterprise Server) version.

(1) Install C/C++ Compiler and Tools:

Capture4-667x500

(2) Install SystemTap tools:

# zypper in systemtap*
......

(3) Install kernel debug info packages:

/mnt/suse/x86_64 # ls | grep kernel
kernel-default-base-debuginfo-3.12.49-3.1.x86_64.rpm
kernel-default-debuginfo-3.12.49-3.1.x86_64.rpm
kernel-default-debugsource-3.12.49-3.1.x86_64.rpm
kernel-xen-base-debuginfo-3.12.49-3.1.x86_64.rpm
kernel-xen-debuginfo-3.12.49-3.1.x86_64.rpm
kernel-xen-debugsource-3.12.49-3.1.x86_64.rpm
kernelshark-debuginfo-2.0.4-3.95.x86_64.rpm
nfs-kernel-server-debuginfo-1.3.0-13.1.x86_64.rpm
/mnt/suse/x86_64 # rpm -ivh kernel*
Preparing...                          ################################# [100%]
......

You can also use zypper in kernel-*-debug*.

(4) Test:

/mnt/suse/x86_64 # stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'
Pass 1: parsed user script and 102 library script(s) using 78240virt/28440res/2708shr/26436data kb, in 160usr/20sys/184real ms.
Pass 2: analyzed script: 1 probe(s), 1 function(s), 3 embed(s), 0 global(s) using 175768virt/126996res/3688shr/123964data kb, in 1650usr/250sys/1902real ms.
Pass 3: using cached /root/.systemtap/cache/38/stap_38af4dc0b3509fcb42d451417e95bbab_1375.c
Pass 4: using cached /root/.systemtap/cache/38/stap_38af4dc0b3509fcb42d451417e95bbab_1375.ko
Pass 5: starting run.
read performed
Pass 5: run completed in 20usr/290sys/638real ms.

All is OK!