SystemTap 笔记 (6)—— 打印userspace堆栈信息

使用SystemTap打印user-space程序的调用栈信息时,需要产生足够的调试信息。这时需要-d--ldd两个选项:

-d MODULE
          Add symbol/unwind information for the given module into the kernel object module.  This  may  enable  symbolic  tracebacks
          from those modules/programs, even if they do not have an explicit probe placed into them.

--ldd  Add symbol/unwind  information  for  all  shared libraries suspected by ldd to be necessary for user-space binaries being
          probe or listed with the -d option.  Caution: this can make the probe modules considerably larger.

-d选项负责加载模块/可执行程序的符号表信息,而-ldd则加载-d modulemodule或是probe需要的共享库符号表信息。参考下例:

 # stap -d /usr/lib/systemd/systemd-udevd --ldd -e 'probe kprocess.create {print_ubacktrace()}'
<no user backtrace at kernel.function("copy_process@../kernel/fork.c:1146").return>
 0x7fec1d14f011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f6feb135011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
WARNING: Missing unwind data for module, rerun with 'stap -d /usr/lib64/libglib-2.0.so.0.3800.2'
 0x7f22c3026011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f22c2ff7ed4 : __fork+0xb4/0x320 [/lib64/libc-2.19.so]
 0x7f22c3a01c35 [/usr/lib64/libglib-2.0.so.0.3800.2+0x8cc35/0x302000]
 0x7f20966a5011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f22c3026011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f20966a5011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
WARNING: Missing unwind data for module, rerun with 'stap -d /usr/lib/systemd/systemd'
 0x7f4e59945ed4 : __fork+0xb4/0x320 [/lib64/libc-2.19.so]
 0x4364f3 [/usr/lib/systemd/systemd+0x364f3/0x113000]
 0x7f22c2ff7ed4 : __fork+0xb4/0x320 [/lib64/libc-2.19.so]
 0x7f22c3a01c35 [/usr/lib64/libglib-2.0.so.0.3800.2+0x8cc35/0x302000]
 0x7fb1bdfb6011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f22c3026011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7fb1bdfb6011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f3bb6e94011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f3bb6e94011 : clone+0x31/0x90 [/lib64/libc-2.19.so]
 0x7f783f704ed4 : __fork+0xb4/0x320 [/lib64/libc-2.19.so]
 0x7f783fd2169b [/usr/lib64/libpython2.7.so.1.0+0x10f69b/0x3a0000]

参考资料:
Is there any better method to pass “-d OBJECT” options in command line?
User-Space Stack Backtraces

 

 

SystemTap 笔记 (5)—— target variable (1)

关于target variable的解释:

The probe events that map to actual locations in the code (for example kernel.function(“function”) and kernel.statement(“statement”)) allow the use of target variables to obtain the value of variables visible at that location in the code. You can use the -L option to list the target variable available at a probe point.

其实,目前更倾向于使用context variable这个名字,而不是target variable(可以参考这封邮件)。使用target variable需要有kerneldebuginfo。参考下面例子:

# stap -L 'kernel.function("vfs_read")'
kernel.function("vfs_read@../fs/read_write.c:381") $file:struct file* $buf:char* $count:size_t $pos:loff_t*

每个target variable前面有$:后面跟着变量类型。例如:file变量的类型就是struct file*。也可对照vfs_read的定义:

ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)

此外,对于target variable不属于当前probelocal变量,可以使用@var("varname@src/file.c")来访问:

When a target variable is not local to the probe point, like a global external variable or a file local static variable defined in another file then it can be referenced through “@var(“varname@src/file.c”)”.

请看下面这个例子:

# stap -e 'probe kernel.function("vfs_read") {
           printf ("current files_stat max_files: %d\n",
                   @var("files_stat@fs/file_table.c")->max_files);
           exit(); }'
current files_stat max_files: 82002

也可以通过指针访问一些基本类型的数据:

kernel_char(address)
Obtain the character at address from kernel memory.
kernel_short(address)
Obtain the short at address from kernel memory.
kernel_int(address)
Obtain the int at address from kernel memory.
kernel_long(address)
Obtain the long at address from kernel memory
kernel_string(address)
Obtain the string at address from kernel memory.
kernel_string_n(address, n)
Obtain the string at address from the kernel memory and limits the string to n bytes.

 

SystemTap 笔记 (4)—— timer event

timer event会周期性执行handler。举个例子:

# stap -e 'probe timer.s(1) { printf("Hello world!\n");}'
Hello world!
Hello world!
Hello world!
Hello world!

上面脚本每隔1秒打印一次Hello world!

timer event定义如下:

timer.ms(milliseconds)
timer.us(microseconds)
timer.ns(nanoseconds)
timer.hz(hertz)
timer.jiffies(jiffies)

另外,还有一种randomize表示方式(参考自这里):

timer.jiffies(N).randomize(M)

The probe handler is run every N jiffies (a kernel-defined unit of time, typically between 1 and 60 ms). If the “randomize” component is given, a linearly distributed random value in the range [-M..+M] is added to N every time the handler is run. N is restricted to a reasonable range (1 to around a million), and M is restricted to be smaller than N.

Alternatively, intervals may be specified in units of time. There are two probe point variants similar to the jiffies timer:

timer.ms(N)

timer.ms(N).randomize(M)

Here, N and M are specified in milliseconds, but the full options for units are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for hertz timers.

最后结合一个例子看一下如何使用timer event(选自这里):

global count_jiffies, count_ms
probe timer.jiffies(100) { count_jiffies ++ }
probe timer.ms(100) { count_ms ++ }
probe timer.ms(12345)
{
  hz=(1000*count_jiffies) / count_ms
  printf ("jiffies:ms ratio %d:%d => CONFIG_HZ=%d\n",
    count_jiffies, count_ms, hz)
  exit ()
}

首先要知道,每秒发生HZjiffies

其次,每发生100jiffiescount_jiffies计数加1,所以脚本退出时,一共发生100 * count_jiffiesHZ。一共经历了count_ms / 10秒。

最后计算CONFIG_HZ(100 * count_jiffies) / (count_ms / 10) = (1000 * count_jiffies) / count_ms

 

SystemTap 笔记 (2)—— 函数probe

函数probe的语法定义:

{kernel|module("module-pattern")}.function("function-pattern")[.{call|return[.maxactive(VALUE)]|inline}]

kernel指的是kernle image文件(vmlinux),而module则指“/lib/modules/uname -r”下的模块,即ko文件。

关于callreturnmaxactive(VALUE)inline的解释:

call is used to attach entry point non-inlined function, while .inline is used to attach first instruction of inlined function;

maxactive specifies how many instances of the specified function can be probed simultaneously. You can leave off .maxactive in most cases, as the default (KRETACTIVE) should be sufficient. However, if you notice an excessive number of skipped probes, try setting .maxactive to incrementally higher values to see if the number of skipped probes decreases.

.return is used for return points of non-inlined functions;

empty suffix is treated as combination of .call and .inline suffixes.

function-pattern的定义:

function-name[@source-path[{:line-number|:first-line-last-line|+relative-line-number}]]

stap -l 'kernel.function("*")'列出当前所有kernelfunction probe:

linux: # stap -l 'kernel.function("*")'
kernel.function("AUDIT_MODE@../security/apparmor/include/policy.h:401")
kernel.function("BLEND_OP@../crypto/sha256_generic.c:48")
kernel.function("C_SYSC_epoll_pwait@../fs/eventpoll.c:2051")
kernel.function("C_SYSC_fanotify_mark@../fs/notify/fanotify/fanotify_user.c:912")
kernel.function("C_SYSC_ftruncate@../fs/open.c:205")
kernel.function("C_SYSC_futex@../kernel/futex_compat.c:174")
kernel.function("C_SYSC_get_robust_list@../kernel/futex_compat.c:135")
kernel.function("C_SYSC_getitimer@../kernel/compat.c:293")
......

stap -l 'module("ahci").function("*")'列出当前所有ahci模块的function probe:

linux: # stap -l 'module("ahci").function("*")'
module("ahci").function("__ahci_port_base@../drivers/ata/ahci.h:372")
module("ahci").function("ahci_broken_online@../drivers/ata/ahci.c:1024")
module("ahci").function("ahci_broken_suspend@../drivers/ata/ahci.c:940")
module("ahci").function("ahci_broken_system_poweroff@../drivers/ata/ahci.c:905")
module("ahci").function("ahci_configure_dma_masks@../drivers/ata/ahci.c:700")
module("ahci").function("ahci_gtf_filter_workaround@../drivers/ata/ahci.c:1075")
module("ahci").function("ahci_host_activate@../drivers/ata/ahci.c:1164")
module("ahci").function("ahci_init_interrupts@../drivers/ata/ahci.c:1122")
module("ahci").function("ahci_init_one@../drivers/ata/ahci.c:1211")
module("ahci").function("ahci_nr_ports@../drivers/ata/ahci.h:386")
module("ahci").function("ahci_p5wdh_hardreset@../drivers/ata/ahci.c:604")
module("ahci").function("ahci_p5wdh_workaround@../drivers/ata/ahci.c:775")
module("ahci").function("ahci_pci_device_resume@../drivers/ata/ahci.c:677")
module("ahci").function("ahci_pci_device_suspend@../drivers/ata/ahci.c:649")

 

SystemTap 笔记 (1)—— probe定义

SystemTapprobe定义:

probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }

一个probe可以定义多个PROBEPOINT(也称为event),它们共享一个handler函数。PROBEPOINT可分为两种:

a)同步(synchronous):

A synchronous event occurs when any process executes an instruction at a particular location in kernel code. This gives other events a reference point from which more contextual data may be available.

syscall.system_callkernel.function("function")都属于同步PROBEPOINT

b)异步(asynchronous):

Asynchronous events are not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs.

beginendtimer等都属于异步PROBEPOINT

参考资料:
SystemTap Scripts

 

*NIX & Hacking —— 第9期

做一本我感兴趣的杂志,就这么简单!

Assembler

Assembler relaxation

GDB

GDB dashboard

Go

Best practices for a new Go developer
On Go, Portability, and System Interfaces

Kernel

A Toure of Bootloading
GRUB 2 bootloader – Full tutorial
How I ended up writing new real-time kernel
Kernel bypass
Linux Kernel Crash Book

Network

TCP in 30 instructions

RMS

Interviews: RMS Answers Your Questions

Rust

Why Rust?

Tracing

Dynamic Tracing with DTrace & SystemTap