SystemTap 笔记 (10)—— “@defined”和“@choose_defined”

随着代码的不断变化,有些target variable可能在新的版本里不存在了。@defined用来检查target variable是否存在。举例如下:

probe vm.pagefault = kernel.function("__handle_mm_fault@mm/memory.c") ?,
                     kernel.function("handle_mm_fault@mm/memory.c") ?
{
        write_access = (@defined($flags) ? $flags & FAULT_FLAG_WRITE : $write_access)
}

上述代码则用来根据flags是否存在,来赋给write_access不同的值。

此外还有@choose_defined@choose_defined($a, $b)相当于@defined($a)? $a : $b。举例如下:

probe vm.pagefault = kernel.function("handle_mm_fault@mm/memory.c")
{
        write_access = @choose_defined($write_access, 0)
}

 

参考资料:
Checking Target Variable Availability

Arguments

 

最孤独的人?最幸福的人?

上周看到一篇报道,讲的是一名俄罗斯气象员独自一人在极地附近工作和生活的故事。看完以后,我还特意找到了这则消息最原始的英文文章出处,详细地读了一下。有人说,他是世界上最孤独的人,不过在我看来,他同时也是最幸福的人。他可以把整个世俗抛在身后,每天专心致志地从事自己想做的事情,过着一种几乎“与世无争”的生活,在自己的“桃花源”中尽情地享受着。在当下这个世界,有几个能像他这般“幸福”?真是让人羡慕不已。。。

Crash工具笔记 (3)—— 在Xen环境使用crash

这两周一直在crash邮件列表里讨论如何在SuSE Xen上使用crash调试Dom0 kernel。邮件来来回回讨论很多(参见这里),最后还发现了一个bug。细节不说了,把最后的结果总结一下:

(1)由于SuSE kerenl默认编译打开CONFIG_STRICT_DEVMEM编译开关,所以crash工具无法完全访问/dev/mem,可以使用/proc/kcore作为代替;

(2)SuSE带有crash.ko驱动(位于:“/lib/modules/uname -r/updates/crash.ko”),但默认没有安装,可以自己手动安装(使用insmod命令),然后就可以使用了:

# crash

crash 7.1.3
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.

crash: /boot/xen-4.5.gz: original filename unknown
       Use "-f /boot/xen-4.5.gz" on command line to prevent this message.

WARNING: machine type mismatch:
         crash utility: X86_64
         /var/tmp/xen-4.5.gz_ud3IRy: X86

crash: /boot/symtypes-3.12.49-6-default.gz: original filename unknown
       Use "-f /boot/symtypes-3.12.49-6-default.gz" on command line to
prevent this message.

crash: /boot/symvers-3.12.49-6-default.gz: original filename unknown
       Use "-f /boot/symvers-3.12.49-6-default.gz" on command line to
prevent this message.

GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /boot/vmlinux-3.12.49-6-xen.gz
   DEBUGINFO: /usr/lib/debug/boot/vmlinux-3.12.49-6-xen.debug
    DUMPFILE: /dev/crash
        CPUS: 128
        DATE: Fri Nov 20 06:55:06 2015
      UPTIME: 18:51:36
LOAD AVERAGE: 1.76, 1.48, 1.21
       TASKS: 1230
    NODENAME: dl980-5
     RELEASE: 3.12.49-6-xen
     VERSION: #1 SMP Mon Oct 26 16:05:37 UTC 2015 (11560c3)
     MACHINE: x86_64  (1995 Mhz)
      MEMORY: 125.9 GB
         PID: 6618
     COMMAND: "crash"
        TASK: ffff881ea93b2140  [THREAD_INFO: ffff881e869f2000]
         CPU: 112
       STATE: TASK_RUNNING (ACTIVE)

 

Linux kernel 笔记 (49)——ERESTARTSYS和EINTR

LDD3中提到驱动代码返回ERESTARTSYSEINTR时如何选择:

Note the check on the return value of down_interruptible; if it returns nonzero, the operation was interrupted. The usual thing to do in this situation is to return -ERESTARTSYS。 Upon seeing this return code, the higher layers of the kernel will either restart the call from the beginning or return the error to the user. If you return -ERESTARTSYS , you must first undo any user-visible changes that might have been made, so that the right thing happens when the system call is retried. If you cannot undo things in this manner, you should return -EINTR instead.

即如果可以把用户看到的设备状态完全回滚到执行驱动代码之前,则返回ERESTARTSYS,否则返回EINTR。因为EINTR错误可以使系统调用失败,并且返回错误码为EINTR给应用程序。而ERESTARTSYS有可能会让kernel重新发起操作,而不会惊动应用程序。可以参考这篇帖子

 

Linux kernel 笔记 (48)——CONFIG_STRICT_DEVMEM和/dev/crash

CONFIG_STRICT_DEVMEM配置项的作用是控制对/dev/mem的访问:一旦置成yes,则只能访问一段特定的区域。比如在X86平台,只能访问内存开始的1M区域:

# dd if=/dev/mem of=/dev/null
dd: error reading ‘/dev/mem’: Operation not permitted
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 0.0349979 s, 30.0 MB/s

RedHat开发了一个驱动:/dev/crash,可以用来取代/dev/mem,方便调试器(例如crash)访问物理内存区域。

参考资料:
/dev/crash Driver
Tools:Memory Imaging

 

Linux kernel 笔记 (47)——操作信号量的函数

操作信号量的函数如下:

#include <linux/semaphore.h>
void down(struct semaphore *sem);
int down_interruptible(struct semaphore *sem);
int down_killable(struct semaphore *sem);
int down_trylock(struct semaphore *sem); 
int down_timeout(struct semaphore *sem, long jiffies);
void up(struct semaphore *sem);

down已经不再推荐使用。

down_interruptible可以被信号打断,因此需要检查返回值:只有返回0,才表明成功获取了信号量。使用down_interruptible例子如下:

if (down_interruptible(&sem)) return -ERESTARTSYS;

down_killable只能被fatal信号打断,这种信号通常用来终止进程,因此down_killable用了保证用户进程可以被杀死,否则一旦有死锁进程,则只能重启系统。

down_trylock是非阻塞版本的down,也要检查返回值。举例如下:

if (file->f_flags & O_NONBLOCK) {
    if (down_trylock(&iosem)) return -EAGAIN;
} else {
    if (down_interruptible(&iosem)) return -ERESTARTSYS;
}

down_timeout用来等待一段时间,中间也不能被信号打断。

up用来释放信号量,不需要提供interrupt版本。

参考资料:
Mutex, semaphore and the proc file system

 

Linux kernel 笔记 (46)——配置crashkernel参数

crashkernel用来配置Kexec启动的第二个kernelcrash kernel),即用来捕获第一个kernel crash dumpkernel的大小和位置。 配置crashkernel参数有四种形式:

(1)

crashkernel=size[@offset]  

保留[offset,offset + size]这段内存,如果@offset省略,则会自动选择一个合适的offset
(2)

crashkernel=range1:size1[,range2:size2,...][@offset]
range=start-[end](包含`start`,但不包含`end`)

举例来看:

crashkernel=512M-2G:64M,2G-:128M

含义如下:
a)如果内存小于512M,则不保留内存;
b)内存介于512M2G之间,保留64M内存;
c)内存2G以上,保留128M内存。

(3)

crashkernel=size,high

只用于X86_64平台。当内存大于4G时,允许kerneltop,也就是高于4G的内存地址开始分配。如果内存小于4G,则自然从低于4G的地址空间分配。如果指定crashkernel=size,则这个选项会被忽略。

(4)

crashkernel=size,low

只用于X86_64平台。当指定crashkernel=size,high时,也需要在low,也就是低于4G的内存地址分配一段内存。默认情况下,系统会尝试自动分配至少256M内存。

参考资料:
Kernel Parameters

 

Crash工具笔记 (2)—— 打印运行“crash”命令的调试信息

使用-d number可以打印运行crash命令时,输出的调试信息。number越大,输出的信息越多。目前-d8可以打印所有的调试信息。举例如下:

# crash -d8

crash 7.0.2-6.el7
Copyright (C) 2002-2013  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.


find_booted_kernel: search for [Linux version 3.10.0-123.el7.x86_64.debug (mockbuild@x86-017.build.eng.bos.redhat.com) (gcc version 4.8.2 20
140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Mon May 5 11:24:18 EDT 2014]
mount_points[0]: / (167c600)
mount_points[1]: /proc (167c620)
mount_points[2]: /sys (167c640)
mount_points[3]: /dev (167c660)
mount_points[4]: /sys/kernel/security (167c680)
mount_points[5]: /dev/shm (167c6b0)
mount_points[6]: /dev/pts (167c6d0)
mount_points[7]: /run (167c6f0)
mount_points[8]: /sys/fs/cgroup (167c710)
mount_points[9]: /sys/fs/cgroup/systemd (167c740)
mount_points[10]: /sys/fs/pstore (167c780)
mount_points[11]: /sys/fs/cgroup/cpuset (167c7b0)
mount_points[12]: /sys/fs/cgroup/cpu,cpuacct (167c7f0)
mount_points[13]: /sys/fs/cgroup/memory (167c830)
......

 

Xen 笔记 (2)——”xen”,”xen-syms” and “xen-*-dbg”文件

以下摘自Xen邮件列表

Hi all,

Since I am a newbie of Xen, this may be a dumb question. On my SuSE Xen, I find the following 4 “Xen” files:

-rw-r–r– 1 root rootÂÂÂ881967 Oct 27 05:46 xen-4.5.110-1.gz
-rw-r–r– 1 root root 16673080 Oct 27 05:46 xen-syms-4.5.1
10-1
-rw-r–r– 1 root rootÂÂÂ893428 Oct 27 05:45 xen-dbg-4.5.110-1.gz
-rw-r–r– 1 root root 16124144 Oct 27 05:45 xen-syms-dbg-4.5.1
10-1

Could anyone give a detailed explanation of the functions and differences about the 4 files?

TL;DR: Use xen-04.5.1_1-1.gz unless someone asks you to do otherwise or you are tracking down a bug.

xen.gz is the main/regular/normal release build of the Xen binary (e.g. the thing which you can boot). This is the thing you would boot on a regular production Xen system.

xen-syms.gz is the unstripped version of xen.gz, i.e. with all the ELF debug information present. You can e.g. run gdb on it to disassemble things if you are tracking down a bug. This image is not bootable.

The -dbg variants are not something which upstream produces, but I would assume that they are build with the debug=y, which means they will contain extra ASSERT statements and other things which aid debugging possibly at the expense of performance. Release builds (xen.gz et al) are build with debug=n. Like the release builds the xen-*-dbg ones come in the bootable (xen-dbg.gz) and xen-syms-dbg.gz variants, with the same distinction (the former is bootable, the latter is for running gdb on)

概括一下,xen文件是可以bootable的,也是用来启动xen系统的可执行文件。xen-symxen的保留所有调试信息的的版本,不可以bootable,仅用于调试功能。xen-*-dbg则是xenxen-sym对应的加了很多调试代码的版本。

 

kmod简介

kmod提供了一组操作Linux kernel module的工具,它是构建在libkmod库之上的(这个库也随kmod源码一并提供)。代码地址:http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/

SuSE Linux执行如下命令:

/sbin # ls -alt | grep kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 depmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 insmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 lsmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 modinfo -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 modprobe -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 rmmod -> /usr/bin/kmod

可以看到,平时常用的insmodmodprobe等命令本质上调用的都是kmod命令。