SystemTap 笔记 (2)—— 函数probe

函数probe的语法定义:

{kernel|module("module-pattern")}.function("function-pattern")[.{call|return[.maxactive(VALUE)]|inline}]

kernel指的是kernle image文件(vmlinux),而module则指“/lib/modules/uname -r”下的模块,即ko文件。

关于callreturnmaxactive(VALUE)inline的解释:

call is used to attach entry point non-inlined function, while .inline is used to attach first instruction of inlined function;

maxactive specifies how many instances of the specified function can be probed simultaneously. You can leave off .maxactive in most cases, as the default (KRETACTIVE) should be sufficient. However, if you notice an excessive number of skipped probes, try setting .maxactive to incrementally higher values to see if the number of skipped probes decreases.

.return is used for return points of non-inlined functions;

empty suffix is treated as combination of .call and .inline suffixes.

function-pattern的定义:

function-name[@source-path[{:line-number|:first-line-last-line|+relative-line-number}]]

stap -l 'kernel.function("*")'列出当前所有kernelfunction probe:

linux: # stap -l 'kernel.function("*")'
kernel.function("AUDIT_MODE@../security/apparmor/include/policy.h:401")
kernel.function("BLEND_OP@../crypto/sha256_generic.c:48")
kernel.function("C_SYSC_epoll_pwait@../fs/eventpoll.c:2051")
kernel.function("C_SYSC_fanotify_mark@../fs/notify/fanotify/fanotify_user.c:912")
kernel.function("C_SYSC_ftruncate@../fs/open.c:205")
kernel.function("C_SYSC_futex@../kernel/futex_compat.c:174")
kernel.function("C_SYSC_get_robust_list@../kernel/futex_compat.c:135")
kernel.function("C_SYSC_getitimer@../kernel/compat.c:293")
......

stap -l 'module("ahci").function("*")'列出当前所有ahci模块的function probe:

linux: # stap -l 'module("ahci").function("*")'
module("ahci").function("__ahci_port_base@../drivers/ata/ahci.h:372")
module("ahci").function("ahci_broken_online@../drivers/ata/ahci.c:1024")
module("ahci").function("ahci_broken_suspend@../drivers/ata/ahci.c:940")
module("ahci").function("ahci_broken_system_poweroff@../drivers/ata/ahci.c:905")
module("ahci").function("ahci_configure_dma_masks@../drivers/ata/ahci.c:700")
module("ahci").function("ahci_gtf_filter_workaround@../drivers/ata/ahci.c:1075")
module("ahci").function("ahci_host_activate@../drivers/ata/ahci.c:1164")
module("ahci").function("ahci_init_interrupts@../drivers/ata/ahci.c:1122")
module("ahci").function("ahci_init_one@../drivers/ata/ahci.c:1211")
module("ahci").function("ahci_nr_ports@../drivers/ata/ahci.h:386")
module("ahci").function("ahci_p5wdh_hardreset@../drivers/ata/ahci.c:604")
module("ahci").function("ahci_p5wdh_workaround@../drivers/ata/ahci.c:775")
module("ahci").function("ahci_pci_device_resume@../drivers/ata/ahci.c:677")
module("ahci").function("ahci_pci_device_suspend@../drivers/ata/ahci.c:649")

 

SystemTap 笔记 (1)—— probe定义

SystemTapprobe定义:

probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }

一个probe可以定义多个PROBEPOINT(也称为event),它们共享一个handler函数。PROBEPOINT可分为两种:

a)同步(synchronous):

A synchronous event occurs when any process executes an instruction at a particular location in kernel code. This gives other events a reference point from which more contextual data may be available.

syscall.system_callkernel.function("function")都属于同步PROBEPOINT

b)异步(asynchronous):

Asynchronous events are not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs.

beginendtimer等都属于异步PROBEPOINT

参考资料:
SystemTap Scripts

 

Xen 笔记 (1)——为什么xen是32位的可执行程序?

今天编译了一下Xen,发现编出来的xen32位的可执行程序,而xen-syms却是64位的:

Linux:~/Downloads/xen-4.6.0/xen # file xen
xen: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
Linux:~/Downloads/xen-4.6.0/xen # file xen-syms
xen-syms: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped

Roger在邮件里给了答案:

The Xen entry point is in 32bits (because that’s what the multiboot specification requires). Xen then jumps into long mode (64bits) by itself, so there’s only a very small amount of 32bit code that’s used as a trampoline.

原来是为了支持multibootxen实际运行以后还是64位程序。

参考资料:
[Xen-users] Why the built xen file is 32-bit on 64-bit OS?

 

Linux kernel 笔记 (37)——”system.map”和“/proc/kallsyms”

system.map包含kernel image的符号表。/proc/kallsyms则包含kernel image和所有动态加载模块的符号表。如果一个函数被编译器内联(inline)或者优化掉了,则它在/proc/kallsyms有可能找不到。

此外,如果不是root用户,则显示/proc/kallsyms中的地址都是0

$ cat /proc/kallsyms | more
0000000000000000 A irq_stack_union
0000000000000000 A __per_cpu_start
0000000000000000 A cpu_debug_store
0000000000000000 A cpu_tss_rw
......

$ sudo cat /proc/kallsyms | more
[sudo] password for xiaonan:
0000000000000000 A irq_stack_union
0000000000000000 A __per_cpu_start
0000000000004000 A cpu_debug_store
0000000000005000 A cpu_tss_rw
0000000000008000 A gdt_page
0000000000009000 A exception_stacks
......

看起来kallsyms_lookup_name需要CONFIG_KALLSYMS_ALL设置为Y(参考CONFIG_KALLSYMS_ALL)。

Module.symvers包含了kernel所有exportsymbols(参考What is the purpose of “Module.symvers” in Linux?)。这个链接讲了/proc/kallsyms,可以和Module.symvers对比。

参考资料:

Reading kallsyms in user-mode ;
Does kallsyms have all the symbol of kernel functions?
System.map file and /proc/kallsyms
system.map

 

Linux kernel 笔记 (36)——”procfs”简介

Procfs是一个RAM-based虚拟文件系统,挂载在proc目录下。proc就像系统的一面镜子,通过它可以得到运行系统的很多信息。 /proc中的文件是在访问时由kernel动态生成的。

dl980-5:/proc # ls
1     1166  1234  1452  170   198   243   287   308   3334  372   416   46    503  550   5989  640   756   911  960        execdomains
10    1167  1236  1453  1702  199   244   2872  3087  334   373   4163  460   504  551   599   641   7591  912  961        fb
100   1168  1237  146   1703  2     245   2873  309   3342  374   417   461   505  552   6     642   76    913  962        filesystems
1003  1169  124   147   1704  20    246   2876  3093  3344  375   418   462   506  553   60    643   764   914  963        fs
1007  117   1240  1478  1706  200   247   2877  31    335   376   419   4627  507  554   600   645   77    915  964        interrupts
1008  1170  1241  1479  171   201   248   288   310   3352  377   42    463   508  555   601   646   78    916  965        iomem
1009  1171  1242  148   172   2018  249   2881  311   3358  378   420   464   509  556   602   647   7818  917  966        ioports
101   1172  1243  149   1721  202   25    2884  312   336   379   421   4646  51   557   603   648   8     918  967        ipmi
1010  1173  1244  15    1729  203   250   289   313   3360  38    422   465   510  558   604   649   80    919  968        irq
1011  1174  1245  150   173   204   251   2890  3137  3361  380   423   466   511  56    605   65    81    92   969        kallsyms
1012  1175  1246  151   1736  205   252   29    3139  337   381   424   467   512  560   6052  650   82    920  97         kcore
1013  1176  1247  152   1737  206   253   290   314   338   382   425   468   513  561   6058  651   83    921  970        key-users
1015  1177  1248  1520  174   2060  254   2908  315   3389  3827  426   469   514  562   606   652   84    922  971        kmsg
1016  1178  1249  1522  175   207   255   291   3155  339   383   427   47    515  563   607   653   8461  923  972        kpagecount
1017  1179  125   1529  1751  208   256   2911  3156  34    384   428   470   516  564   608   654   85    924  973        kpageflags
1018  118   1250  153   1755  209   257   2912  3158  340   385   429   471   517  565   6086  655   86    925  974        latency_stats
1019  1180  1251  154   176   21    258   2915  316   341   386   43    472   518  566   609   656   87    926  975        loadavg
102   1181  1252  155   177   210   259   292   3161  342   387   430   473   519  567   61    657   874   927  976        locks
1025  1182  1253  156   178   211   26    2921  3163  3426  388   431   475   52   568   610   658   876   928  977        meminfo
103   1183  1254  157   1787  212   260   2927  3166  343   389   432   476   520  57    611   659   877   929  978        misc
1042  1186  1255  158   1788  213   261   293   317   344   390   433   477   521  570   612   66    879   93   98         modules
1044  1187  1259  16    179   215   262   2932  318   345   391   434   4777  522  5708  613   660   88    930  980        mounts
1045  1188  126   160   1790  216   263   2939  3183  346   392   435   478   523  571   614   661   880   931  981        mtrr
....

数字代表的是进程号,也是一个目录,通过/proc/pid就可以得到这个进程的信息。其它像kmsgmeminfo等则提供了系统的其它信息。

参考资料:
EXPLORING LINUX PROCFS VIA SHELL SCRIPTS

 

Linux kernel 笔记 (35)——”linux/version.h”文件

<linux/version.h>是由顶级目录下的Makefile生成的:

......
define filechk_version.h
    (echo \#define LINUX_VERSION_CODE $(shell                         \
    expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
    echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';)
endef

$(version_h): $(srctree)/Makefile FORCE
    $(call filechk,version.h)
    $(Q)rm -f $(old_version_h)
......

它包含了LINUX_VERSION_CODEKERNEL_VERSION这两个macro定义。以下面这个版本为例:

......
#define LINUX_VERSION_CODE 199680
#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
......

199680对应3.12.0版本。

参考资料:
How the “<linux/version.h>” file is generated?

 

2015南京之行暨参加Linux kernel开发者大会

10月15日~19日,第一次去了南京,参加了第十次Linux kernel开发者大会,也顺便游览了一下这个六朝古都。

我是10月15日下午出发的,到了南京已是晚上,感觉晚上南京地铁人不是很多。经过一番周折,找到网上预订的酒店,结果一推门一股烟味扑面而来,很是呛人,于是赶紧开窗通风。房间其它还好,就是卫生间没门,让人不大习惯。安顿以后,出外面随便吃了点东西,就早早休息了。

16日早上到附近的“狮子桥”步行街吃了汤包,因为以前在电影中看到过南京汤包,所以这次一定要尝尝。接下来一天去了总统府,六朝古都博物馆,中山陵。因为我个人对历史比较感兴趣,所以就侧重逛一下这些历史气息比较浓厚的地方。总体感觉不错,民国气息比较浓。

10月17日和18日这两天全部用来参加Linux kernel开发者大会。这次大会在南京大学举行,不仅不收门票,而且中午也在南大食堂提供免费午餐。与现在门票动辄几百元甚至几千元的技术大会相比,真的算是“良心办会”了。当然,现在很多技术大会需要费用租用场地和提供午餐,而南大有现成的场地和食堂,这是一个优势。但是不排除现在有些技术大会的确“变了味”。总体来说,两天参会还是有点收获。

19日早上退了房,由于背着行李不方便,所以就简单地逛了逛雨花台,然后就启程回京。南京之行结束。

Linux kernel 笔记 (34)——模块参数

module_parammodule_param_named定义在<linux/moduleparam.h>文件:

/**
 * module_param - typesafe helper for a module/cmdline parameter
 * @value: the variable to alter, and exposed parameter name.
 * @type: the type of the parameter
 * @perm: visibility in sysfs.
 *
 * @value becomes the module parameter, or (prefixed by KBUILD_MODNAME and a
 * ".") the kernel commandline parameter.  Note that - is changed to _, so
 * the user can use "foo-bar=1" even for variable "foo_bar".
 *
 * @perm is 0 if the the variable is not to appear in sysfs, or 0444
 * for world-readable, 0644 for root-writable, etc.  Note that if it
 * is writable, you may need to use kparam_block_sysfs_write() around
 * accesses (esp. charp, which can be kfreed when it changes).
 *
 * The @type is simply pasted to refer to a param_ops_##type and a
 * param_check_##type: for convenience many standard types are provided but
 * you can create your own by defining those variables.
 *
 * Standard types are:
 *  byte, short, ushort, int, uint, long, ulong
 *  charp: a character pointer
 *  bool: a bool, values 0/1, y/n, Y/N.
 *  invbool: the above, only sense-reversed (N = true).
 */
#define module_param(name, type, perm)              \
    module_param_named(name, name, type, perm)

/**
 * module_param_named - typesafe helper for a renamed module/cmdline parameter
 * @name: a valid C identifier which is the parameter name.
 * @value: the actual lvalue to alter.
 * @type: the type of the parameter
 * @perm: visibility in sysfs.
 *
 * Usually it's a good idea to have variable names and user-exposed names the
 * same, but that's harder if the variable must be non-static or is inside a
 * structure.  This allows exposure under a different name.
 */
#define module_param_named(name, value, type, perm)            \
    param_check_##type(name, &(value));                \
    module_param_cb(name, &param_ops_##type, &value, perm);        \
    __MODULE_PARM_TYPE(name, #type)

module_param用来定义一个模块参数,type指定类型(intbool等等),perm指定用户访问权限,取值如下(<linux/stat.h>):

#define S_IRWXU 00700
#define S_IRUSR 00400
#define S_IWUSR 00200
#define S_IXUSR 00100

#define S_IRWXG 00070
#define S_IRGRP 00040
#define S_IWGRP 00020
#define S_IXGRP 00010

#define S_IRWXO 00007
#define S_IROTH 00004
#define S_IWOTH 00002
#define S_IXOTH 00001

#define S_IRWXUGO   (S_IRWXU|S_IRWXG|S_IRWXO)
#define S_IALLUGO   (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
#define S_IRUGO     (S_IRUSR|S_IRGRP|S_IROTH)
#define S_IWUGO     (S_IWUSR|S_IWGRP|S_IWOTH)
#define S_IXUGO     (S_IXUSR|S_IXGRP|S_IXOTH)

module_param_named则是为变量取一个可读性更好的名字。

ktap源码为例:

int kp_max_loop_count = 100000;
module_param_named(max_loop_count, kp_max_loop_count, int, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(max_loop_count, "max loop execution count");

加载ktapvm模块,读取kp_max_loop_count的值:

[root@Linux ~]# cat /sys/module/ktapvm/parameters/max_loop_count
100000
[root@Linux ~]# ls -lt /sys/module/ktapvm/parameters/max_loop_count
-rw-r--r--. 1 root root 4096 Oct 22 22:51 /sys/module/ktapvm/parameters/max_loop_count

可以看到kp_max_loop_count变量在/sys/module/ktapvm/parameters文件夹下的名字是max_loop_count,值是100000,只有root用户拥有写权限。可以通过修改这个文件达到改变kp_max_loop_count变量的目的:

[root@Linux ~]# echo 200000 > /sys/module/ktapvm/parameters/max_loop_count
[root@Linux ~]# cat /sys/module/ktapvm/parameters/max_loop_count
200000

MODULE_PARM_DESC用来定义参数的描述信息,使用modinfo命令可以查看:

[root@Linux ~]# modinfo ktapvm.ko
.....
parm:           max_loop_count:max loop execution count (int)

参考资料:
Everything You Wanted to Know About Module Parameters

virt-manager/virsh调试技巧(不断更新)

(1)使用virt-manager --trace-libvirt --debug可以输出virt-manager的调试信息。

Linux:~ # virt-manager --trace-libvirt --debug
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (cli:246) Launched with command line: /usr/share/virt-manager/virt-manager --trace-libvirt --debug
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:153) virt-manager version: 1.2.1
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:154) virtManager import: <module 'virtManager' from '/usr/share/virt-manager/virtManager/__init__.pyc'>
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:157) Libvirt tracing requested
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _dispatchEventHandleCallback at 0x7ff4391fc050> _dispatchEventHandleCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _dispatchEventTimeoutCallback at 0x7ff4391fc0c8> _dispatchEventTimeoutCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _eventInvokeHandleCallback at 0x7ff439251ed8> _eventInvokeHandleCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _eventInvokeTimeoutCallback at 0x7ff439251f50> _eventInvokeTimeoutCallback
......

也可以重定向输出到文件:

Linux:~ # virt-manager --trace-libvirt --debug > log.txt 2>&1

(2)通过virsh输出Guest OS日志:

Linux:~ # virsh
Welcome to virsh, the virtualization interactive terminal.

Type:  'help' for help with commands
       'quit' to quit

virsh # list
 Id    Name                           State
----------------------------------------------------
 0     Domain-0                       running
 3     sles11sp4-i686                 running

virsh # console 3
Connected to domain sles11sp4-i686
Escape character is ^]
[    0.000000] Reserving virtual address space above 0xf5800000
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.0.101-63-xen (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c)
......