SystemTap
提供了tapset
库(通常情况下,安装在/usr/share/systemtap/tapset
文件夹),类似于C
语言的函数库libc
,tapset
提供了函数,全局变量等供SystemTap
脚本使用。
关于tapset
的手册可以参考这里。
SystemTap
提供了tapset
库(通常情况下,安装在/usr/share/systemtap/tapset
文件夹),类似于C
语言的函数库libc
,tapset
提供了函数,全局变量等供SystemTap
脚本使用。
关于tapset
的手册可以参考这里。
函数probe
的语法定义:
{kernel|module("module-pattern")}.function("function-pattern")[.{call|return[.maxactive(VALUE)]|inline}]
kernel
指的是kernle image
文件(vmlinux
),而module
则指“/lib/modules/
uname -r
”下的模块,即
ko
文件。
关于call
,return
,maxactive(VALUE)
和inline
的解释:
call is used to attach entry point non-inlined function, while .inline is used to attach first instruction of inlined function;
maxactive specifies how many instances of the specified function can be probed simultaneously. You can leave off .maxactive in most cases, as the default (KRETACTIVE) should be sufficient. However, if you notice an excessive number of skipped probes, try setting .maxactive to incrementally higher values to see if the number of skipped probes decreases.
.return is used for return points of non-inlined functions;
empty suffix is treated as combination of .call and .inline suffixes.
function-pattern
的定义:
function-name[@source-path[{:line-number|:first-line-last-line|+relative-line-number}]]
stap -l 'kernel.function("*")'
列出当前所有kernel
的function probe
:
linux: # stap -l 'kernel.function("*")'
kernel.function("AUDIT_MODE@../security/apparmor/include/policy.h:401")
kernel.function("BLEND_OP@../crypto/sha256_generic.c:48")
kernel.function("C_SYSC_epoll_pwait@../fs/eventpoll.c:2051")
kernel.function("C_SYSC_fanotify_mark@../fs/notify/fanotify/fanotify_user.c:912")
kernel.function("C_SYSC_ftruncate@../fs/open.c:205")
kernel.function("C_SYSC_futex@../kernel/futex_compat.c:174")
kernel.function("C_SYSC_get_robust_list@../kernel/futex_compat.c:135")
kernel.function("C_SYSC_getitimer@../kernel/compat.c:293")
......
stap -l 'module("ahci").function("*")'
列出当前所有ahci
模块的function probe
:
linux: # stap -l 'module("ahci").function("*")'
module("ahci").function("__ahci_port_base@../drivers/ata/ahci.h:372")
module("ahci").function("ahci_broken_online@../drivers/ata/ahci.c:1024")
module("ahci").function("ahci_broken_suspend@../drivers/ata/ahci.c:940")
module("ahci").function("ahci_broken_system_poweroff@../drivers/ata/ahci.c:905")
module("ahci").function("ahci_configure_dma_masks@../drivers/ata/ahci.c:700")
module("ahci").function("ahci_gtf_filter_workaround@../drivers/ata/ahci.c:1075")
module("ahci").function("ahci_host_activate@../drivers/ata/ahci.c:1164")
module("ahci").function("ahci_init_interrupts@../drivers/ata/ahci.c:1122")
module("ahci").function("ahci_init_one@../drivers/ata/ahci.c:1211")
module("ahci").function("ahci_nr_ports@../drivers/ata/ahci.h:386")
module("ahci").function("ahci_p5wdh_hardreset@../drivers/ata/ahci.c:604")
module("ahci").function("ahci_p5wdh_workaround@../drivers/ata/ahci.c:775")
module("ahci").function("ahci_pci_device_resume@../drivers/ata/ahci.c:677")
module("ahci").function("ahci_pci_device_suspend@../drivers/ata/ahci.c:649")
SystemTap
的probe
定义:
probe PROBEPOINT [, PROBEPOINT] { [STMT ...] }
一个probe
可以定义多个PROBEPOINT
(也称为event
),它们共享一个handler
函数。PROBEPOINT
可分为两种:
a)同步(synchronous
):
A synchronous event occurs when any process executes an instruction at a particular location in kernel code. This gives other events a reference point from which more contextual data may be available.
像syscall.system_call
,kernel.function("function")
都属于同步PROBEPOINT
。
b)异步(asynchronous
):
Asynchronous events are not tied to a particular instruction or location in code. This family of probe points consists mainly of counters, timers, and similar constructs.
begin
,end
,timer
等都属于异步PROBEPOINT
。
参考资料:
SystemTap Scripts。
今天编译了一下Xen
,发现编出来的xen
是32
位的可执行程序,而xen-syms
却是64
位的:
Linux:~/Downloads/xen-4.6.0/xen # file xen
xen: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped
Linux:~/Downloads/xen-4.6.0/xen # file xen-syms
xen-syms: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
Roger
在邮件里给了答案:
The Xen entry point is in 32bits (because that’s what the multiboot specification requires). Xen then jumps into long mode (64bits) by itself, so there’s only a very small amount of 32bit code that’s used as a trampoline.
原来是为了支持multiboot
,xen
实际运行以后还是64
位程序。
参考资料:
[Xen-users] Why the built xen file is 32-bit on 64-bit OS?。
system.map
包含kernel image
的符号表。/proc/kallsyms
则包含kernel image
和所有动态加载模块的符号表。如果一个函数被编译器内联(inline
)或者优化掉了,则它在/proc/kallsyms
有可能找不到。
此外,如果不是root
用户,则显示/proc/kallsyms
中的地址都是0
:
$ cat /proc/kallsyms | more
0000000000000000 A irq_stack_union
0000000000000000 A __per_cpu_start
0000000000000000 A cpu_debug_store
0000000000000000 A cpu_tss_rw
......
$ sudo cat /proc/kallsyms | more
[sudo] password for xiaonan:
0000000000000000 A irq_stack_union
0000000000000000 A __per_cpu_start
0000000000004000 A cpu_debug_store
0000000000005000 A cpu_tss_rw
0000000000008000 A gdt_page
0000000000009000 A exception_stacks
......
看起来kallsyms_lookup_name
需要CONFIG_KALLSYMS_ALL
设置为Y
(参考CONFIG_KALLSYMS_ALL)。
Module.symvers
包含了kernel
所有export
的symbols
(参考What is the purpose of “Module.symvers” in Linux?)。这个链接讲了/proc/kallsyms
,可以和Module.symvers
对比。
参考资料:
Reading kallsyms in user-mode ;
Does kallsyms have all the symbol of kernel functions?;
System.map file and /proc/kallsyms;
system.map。
Procfs
是一个RAM-based
虚拟文件系统,挂载在proc
目录下。proc
就像系统的一面镜子,通过它可以得到运行系统的很多信息。 /proc
中的文件是在访问时由kernel
动态生成的。
dl980-5:/proc # ls
1 1166 1234 1452 170 198 243 287 308 3334 372 416 46 503 550 5989 640 756 911 960 execdomains
10 1167 1236 1453 1702 199 244 2872 3087 334 373 4163 460 504 551 599 641 7591 912 961 fb
100 1168 1237 146 1703 2 245 2873 309 3342 374 417 461 505 552 6 642 76 913 962 filesystems
1003 1169 124 147 1704 20 246 2876 3093 3344 375 418 462 506 553 60 643 764 914 963 fs
1007 117 1240 1478 1706 200 247 2877 31 335 376 419 4627 507 554 600 645 77 915 964 interrupts
1008 1170 1241 1479 171 201 248 288 310 3352 377 42 463 508 555 601 646 78 916 965 iomem
1009 1171 1242 148 172 2018 249 2881 311 3358 378 420 464 509 556 602 647 7818 917 966 ioports
101 1172 1243 149 1721 202 25 2884 312 336 379 421 4646 51 557 603 648 8 918 967 ipmi
1010 1173 1244 15 1729 203 250 289 313 3360 38 422 465 510 558 604 649 80 919 968 irq
1011 1174 1245 150 173 204 251 2890 3137 3361 380 423 466 511 56 605 65 81 92 969 kallsyms
1012 1175 1246 151 1736 205 252 29 3139 337 381 424 467 512 560 6052 650 82 920 97 kcore
1013 1176 1247 152 1737 206 253 290 314 338 382 425 468 513 561 6058 651 83 921 970 key-users
1015 1177 1248 1520 174 2060 254 2908 315 3389 3827 426 469 514 562 606 652 84 922 971 kmsg
1016 1178 1249 1522 175 207 255 291 3155 339 383 427 47 515 563 607 653 8461 923 972 kpagecount
1017 1179 125 1529 1751 208 256 2911 3156 34 384 428 470 516 564 608 654 85 924 973 kpageflags
1018 118 1250 153 1755 209 257 2912 3158 340 385 429 471 517 565 6086 655 86 925 974 latency_stats
1019 1180 1251 154 176 21 258 2915 316 341 386 43 472 518 566 609 656 87 926 975 loadavg
102 1181 1252 155 177 210 259 292 3161 342 387 430 473 519 567 61 657 874 927 976 locks
1025 1182 1253 156 178 211 26 2921 3163 3426 388 431 475 52 568 610 658 876 928 977 meminfo
103 1183 1254 157 1787 212 260 2927 3166 343 389 432 476 520 57 611 659 877 929 978 misc
1042 1186 1255 158 1788 213 261 293 317 344 390 433 477 521 570 612 66 879 93 98 modules
1044 1187 1259 16 179 215 262 2932 318 345 391 434 4777 522 5708 613 660 88 930 980 mounts
1045 1188 126 160 1790 216 263 2939 3183 346 392 435 478 523 571 614 661 880 931 981 mtrr
....
数字代表的是进程号,也是一个目录,通过/proc/pid
就可以得到这个进程的信息。其它像kmsg
,meminfo
等则提供了系统的其它信息。
<linux/version.h>
是由顶级目录下的Makefile
生成的:
......
define filechk_version.h
(echo \#define LINUX_VERSION_CODE $(shell \
expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 0$(SUBLEVEL)); \
echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';)
endef
$(version_h): $(srctree)/Makefile FORCE
$(call filechk,version.h)
$(Q)rm -f $(old_version_h)
......
它包含了LINUX_VERSION_CODE
和KERNEL_VERSION
这两个macro
定义。以下面这个版本为例:
......
#define LINUX_VERSION_CODE 199680
#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
......
199680
对应3.12.0
版本。
10月15日~19日,第一次去了南京,参加了第十次Linux kernel开发者大会,也顺便游览了一下这个六朝古都。
我是10月15日下午出发的,到了南京已是晚上,感觉晚上南京地铁人不是很多。经过一番周折,找到网上预订的酒店,结果一推门一股烟味扑面而来,很是呛人,于是赶紧开窗通风。房间其它还好,就是卫生间没门,让人不大习惯。安顿以后,出外面随便吃了点东西,就早早休息了。
16日早上到附近的“狮子桥”步行街吃了汤包,因为以前在电影中看到过南京汤包,所以这次一定要尝尝。接下来一天去了总统府,六朝古都博物馆,中山陵。因为我个人对历史比较感兴趣,所以就侧重逛一下这些历史气息比较浓厚的地方。总体感觉不错,民国气息比较浓。
10月17日和18日这两天全部用来参加Linux kernel开发者大会。这次大会在南京大学举行,不仅不收门票,而且中午也在南大食堂提供免费午餐。与现在门票动辄几百元甚至几千元的技术大会相比,真的算是“良心办会”了。当然,现在很多技术大会需要费用租用场地和提供午餐,而南大有现成的场地和食堂,这是一个优势。但是不排除现在有些技术大会的确“变了味”。总体来说,两天参会还是有点收获。
19日早上退了房,由于背着行李不方便,所以就简单地逛了逛雨花台,然后就启程回京。南京之行结束。
module_param
和module_param_named
定义在<linux/moduleparam.h>
文件:
/**
* module_param - typesafe helper for a module/cmdline parameter
* @value: the variable to alter, and exposed parameter name.
* @type: the type of the parameter
* @perm: visibility in sysfs.
*
* @value becomes the module parameter, or (prefixed by KBUILD_MODNAME and a
* ".") the kernel commandline parameter. Note that - is changed to _, so
* the user can use "foo-bar=1" even for variable "foo_bar".
*
* @perm is 0 if the the variable is not to appear in sysfs, or 0444
* for world-readable, 0644 for root-writable, etc. Note that if it
* is writable, you may need to use kparam_block_sysfs_write() around
* accesses (esp. charp, which can be kfreed when it changes).
*
* The @type is simply pasted to refer to a param_ops_##type and a
* param_check_##type: for convenience many standard types are provided but
* you can create your own by defining those variables.
*
* Standard types are:
* byte, short, ushort, int, uint, long, ulong
* charp: a character pointer
* bool: a bool, values 0/1, y/n, Y/N.
* invbool: the above, only sense-reversed (N = true).
*/
#define module_param(name, type, perm) \
module_param_named(name, name, type, perm)
/**
* module_param_named - typesafe helper for a renamed module/cmdline parameter
* @name: a valid C identifier which is the parameter name.
* @value: the actual lvalue to alter.
* @type: the type of the parameter
* @perm: visibility in sysfs.
*
* Usually it's a good idea to have variable names and user-exposed names the
* same, but that's harder if the variable must be non-static or is inside a
* structure. This allows exposure under a different name.
*/
#define module_param_named(name, value, type, perm) \
param_check_##type(name, &(value)); \
module_param_cb(name, ¶m_ops_##type, &value, perm); \
__MODULE_PARM_TYPE(name, #type)
module_param
用来定义一个模块参数,type
指定类型(int
,bool
等等),perm
指定用户访问权限,取值如下(<linux/stat.h>
):
#define S_IRWXU 00700
#define S_IRUSR 00400
#define S_IWUSR 00200
#define S_IXUSR 00100
#define S_IRWXG 00070
#define S_IRGRP 00040
#define S_IWGRP 00020
#define S_IXGRP 00010
#define S_IRWXO 00007
#define S_IROTH 00004
#define S_IWOTH 00002
#define S_IXOTH 00001
#define S_IRWXUGO (S_IRWXU|S_IRWXG|S_IRWXO)
#define S_IALLUGO (S_ISUID|S_ISGID|S_ISVTX|S_IRWXUGO)
#define S_IRUGO (S_IRUSR|S_IRGRP|S_IROTH)
#define S_IWUGO (S_IWUSR|S_IWGRP|S_IWOTH)
#define S_IXUGO (S_IXUSR|S_IXGRP|S_IXOTH)
module_param_named
则是为变量取一个可读性更好的名字。
以ktap
源码为例:
int kp_max_loop_count = 100000;
module_param_named(max_loop_count, kp_max_loop_count, int, S_IRUGO | S_IWUSR);
MODULE_PARM_DESC(max_loop_count, "max loop execution count");
加载ktapvm
模块,读取kp_max_loop_count
的值:
[root@Linux ~]# cat /sys/module/ktapvm/parameters/max_loop_count
100000
[root@Linux ~]# ls -lt /sys/module/ktapvm/parameters/max_loop_count
-rw-r--r--. 1 root root 4096 Oct 22 22:51 /sys/module/ktapvm/parameters/max_loop_count
可以看到kp_max_loop_count
变量在/sys/module/ktapvm/parameters
文件夹下的名字是max_loop_count
,值是100000
,只有root
用户拥有写权限。可以通过修改这个文件达到改变kp_max_loop_count
变量的目的:
[root@Linux ~]# echo 200000 > /sys/module/ktapvm/parameters/max_loop_count
[root@Linux ~]# cat /sys/module/ktapvm/parameters/max_loop_count
200000
MODULE_PARM_DESC
用来定义参数的描述信息,使用modinfo
命令可以查看:
[root@Linux ~]# modinfo ktapvm.ko
.....
parm: max_loop_count:max loop execution count (int)
参考资料:
Everything You Wanted to Know About Module Parameters。
(1)使用virt-manager --trace-libvirt --debug
可以输出virt-manager
的调试信息。
Linux:~ # virt-manager --trace-libvirt --debug
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (cli:246) Launched with command line: /usr/share/virt-manager/virt-manager --trace-libvirt --debug
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:153) virt-manager version: 1.2.1
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:154) virtManager import: <module 'virtManager' from '/usr/share/virt-manager/virtManager/__init__.pyc'>
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (virt-manager:157) Libvirt tracing requested
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _dispatchEventHandleCallback at 0x7ff4391fc050> _dispatchEventHandleCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _dispatchEventTimeoutCallback at 0x7ff4391fc0c8> _dispatchEventTimeoutCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _eventInvokeHandleCallback at 0x7ff439251ed8> _eventInvokeHandleCallback
[Thu, 22 Oct 2015 13:54:08 virt-manager 6124] DEBUG (module_trace:66) wrapfunc <function _eventInvokeTimeoutCallback at 0x7ff439251f50> _eventInvokeTimeoutCallback
......
也可以重定向输出到文件:
Linux:~ # virt-manager --trace-libvirt --debug > log.txt 2>&1
(2)通过virsh
输出Guest OS
日志:
Linux:~ # virsh
Welcome to virsh, the virtualization interactive terminal.
Type: 'help' for help with commands
'quit' to quit
virsh # list
Id Name State
----------------------------------------------------
0 Domain-0 running
3 sles11sp4-i686 running
virsh # console 3
Connected to domain sles11sp4-i686
Escape character is ^]
[ 0.000000] Reserving virtual address space above 0xf5800000
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.0.101-63-xen (geeko@buildhost) (gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) ) #1 SMP Tue Jun 23 16:02:31 UTC 2015 (4b89d0c)
......