我的站点 | 一个系统软件工程师的随手涂鸦

SystemTap 笔记（5）—— target variable （1）

关于target variable的解释：

The probe events that map to actual locations in the code (for example kernel.function(“function”) and kernel.statement(“statement”)) allow the use of target variables to obtain the value of variables visible at that location in the code. You can use the -L option to list the target variable available at a probe point.

其实，目前更倾向于使用context variable这个名字，而不是target variable（可以参考这封邮件）。使用target variable需要有kernel的debuginfo。参考下面例子：

# stap -L 'kernel.function("vfs_read")'
kernel.function("vfs_read@../fs/read_write.c:381") $file:struct file* $buf:char* $count:size_t $pos:loff_t*

每个target variable前面有$，:后面跟着变量类型。例如：file变量的类型就是struct file*。也可对照vfs_read的定义：

ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)

此外，对于target variable不属于当前probe的local变量，可以使用@var("varname@src/file.c")来访问：

When a target variable is not local to the probe point, like a global external variable or a file local static variable defined in another file then it can be referenced through “@var(“varname@src/file.c”)”.

请看下面这个例子：

# stap -e 'probe kernel.function("vfs_read") {
           printf ("current files_stat max_files: %d\n",
                   @var("files_stat@fs/file_table.c")->max_files);
           exit(); }'
current files_stat max_files: 82002

也可以通过指针访问一些基本类型的数据：

kernel_char(address)
Obtain the character at address from kernel memory.
kernel_short(address)
Obtain the short at address from kernel memory.
kernel_int(address)
Obtain the int at address from kernel memory.
kernel_long(address)
Obtain the long at address from kernel memory
kernel_string(address)
Obtain the string at address from kernel memory.
kernel_string_n(address, n)
Obtain the string at address from the kernel memory and limits the string to n bytes.

Linux kernel 笔记（41）——“inode”结构体中的“i_rdev”成员

inode结构体中有一个i_rdev成员（定义在<linux/fs.h>中）：

struct inode {
    ......
    dev_t           i_rdev;
    ......
}

如果inode代表一个设备，则i_rdev的值为设备号。为了代码更好地可移植性，获取inode的major和minor号应该使用imajor和iminor函数：

static inline unsigned iminor(const struct inode *inode)
{
    return MINOR(inode->i_rdev);
}

static inline unsigned imajor(const struct inode *inode)
{
    return MAJOR(inode->i_rdev);
}

Linux kernel 笔记（40）——”file”和“inode”结构体的比较

LDD中对file结构体的描述：

struct file, defined in <linux/fs.h>, is the second most important data structure used in device drivers. Note that a file has nothing to do with the FILE pointers of user-space programs. A FILE is defined in the C library and never appears in kernel code. A struct file, on the other hand, is a kernel structure that never appears in user programs.

The file structure represents an open file . (It is not specific to device drivers; every open file in the system has an associated struct file in kernel space.) It is created by the kernel on open and is passed to any function that operates on the file, until the last close. After all instances of the file are closed, the kernel releases the data structure.

In the kernel sources, a pointer to struct file is usually called either file or filp (“file pointer”). We’ll consistently call the pointer filp to prevent ambiguities with the structure itself. Thus, file refers to the structure and filp to a pointer to the structure.

对inode结构体的描述：

The inode structure is used by the kernel internally to represent files. Therefore, it is different from the file structure that represents an open file descriptor. There can be numerous file structures representing multiple open descriptors on a single file, but they all point to a single inode structure.

总结如下：在kernel中，每一个文件都有一个inode结构体来表示，而file结构体是和打开的文件描述符关联的。如果一个文件被打开多次，有多个文件描述符，也就相应地有多个file结构体与这个文件关联。而inode却永远只有一个。

libvirt笔记 (3) —— 得到virtualization host的能力信息

getCapabilities方法得到一个字符串，用来描述virtualization host的能力，以及能创建什么样的Guest OS。请看下面代码：

#!/usr/bin/python

from __future__ import print_function
import sys
import libvirt

conn = libvirt.open('xen:///')
if conn == None:
    print('Failed to open connection to xen:///', file=sys.stderr)
    exit(1)

caps = conn.getCapabilities() # caps will be a string of XML
print('Capabilities:\n'+caps)

conn.close()
exit(0)

执行如下：

Capabilities:
<capabilities>

  <host>
    <cpu>
      <arch>x86_64</arch>
      <features>
        <pae/>
      </features>
    </cpu>
    <power_management/>
    <migration_features>
      <live/>
    </migration_features>
    <topology>
      <cells num='1'>
        <cell id='0'>
          <memory unit='KiB'>1048512</memory>
          <cpus num='0'>
          </cpus>
        </cell>
      </cells>
    </topology>
  </host>

  <guest>
    <os_type>xen</os_type>
    <arch name='x86_64'>
      <wordsize>64</wordsize>
      <emulator>/usr/lib/xen/bin/qemu-system-i386</emulator>
      <machine>xenpv</machine>
      <domain type='xen'/>
    </arch>
  </guest>

  <guest>
    <os_type>xen</os_type>
    <arch name='i686'>
      <wordsize>32</wordsize>
      <emulator>/usr/lib/xen/bin/qemu-system-i386</emulator>
      <machine>xenpv</machine>
      <domain type='xen'/>
    </arch>
    <features>
      <pae/>
    </features>
  </guest>

</capabilities>

参考资料：
Capability information。

SystemTap 笔记（4）—— timer event

timer event会周期性执行handler。举个例子：

# stap -e 'probe timer.s(1) { printf("Hello world!\n");}'
Hello world!
Hello world!
Hello world!
Hello world!

上面脚本每隔1秒打印一次Hello world!。

timer event定义如下：

timer.ms(milliseconds)
timer.us(microseconds)
timer.ns(nanoseconds)
timer.hz(hertz)
timer.jiffies(jiffies)

另外，还有一种randomize表示方式（参考自这里）：

timer.jiffies(N).randomize(M)

The probe handler is run every N jiffies (a kernel-defined unit of time, typically between 1 and 60 ms). If the “randomize” component is given, a linearly distributed random value in the range [-M..+M] is added to N every time the handler is run. N is restricted to a reasonable range (1 to around a million), and M is restricted to be smaller than N.

Alternatively, intervals may be specified in units of time. There are two probe point variants similar to the jiffies timer:

timer.ms(N)

timer.ms(N).randomize(M)

Here, N and M are specified in milliseconds, but the full options for units are seconds (s/sec), milliseconds (ms/msec), microseconds (us/usec), nanoseconds (ns/nsec), and hertz (hz). Randomization is not supported for hertz timers.

最后结合一个例子看一下如何使用timer event（选自这里）：

global count_jiffies, count_ms
probe timer.jiffies(100) { count_jiffies ++ }
probe timer.ms(100) { count_ms ++ }
probe timer.ms(12345)
{
  hz=(1000*count_jiffies) / count_ms
  printf ("jiffies:ms ratio %d:%d => CONFIG_HZ=%d\n",
    count_jiffies, count_ms, hz)
  exit ()
}

首先要知道，每秒发生HZ次jiffies。

其次，每发生100次jiffies，count_jiffies计数加1，所以脚本退出时，一共发生100 * count_jiffies次HZ。一共经历了count_ms / 10秒。

最后计算CONFIG_HZ：(100 * count_jiffies) / (count_ms / 10) = (1000 * count_jiffies) / count_ms。

Linux kernel 笔记（39）——”THIS_MODULE”

THIS_MODULE是一个macro，定义在<linux/module.h>中：

#ifdef MODULE
#define MODULE_GENERIC_TABLE(gtype,name)            \
extern const struct gtype##_id __mod_##gtype##_table        \
  __attribute__ ((unused, alias(__stringify(name))))

extern struct module __this_module;
#define THIS_MODULE (&__this_module)
#else  /* !MODULE */
#define MODULE_GENERIC_TABLE(gtype,name)
#define THIS_MODULE ((struct module *)0)
#endif

THIS_MODULE即是__this_module这个变量的地址。__this_module会指向这个模块起始的地址空间，恰好是struct module变量定义的位置。

file_operations结构体的第一个成员是struct module类型的指针，定义在<linux/fs.h>中：

struct file_operations {
    struct module *owner;
    ......
}

LDD对其的解释：

struct module *owner

The first file_operations field is not an operation at all; it is a pointer to the module that “owns” the structure. This field is used to prevent the module from being unloaded while its operations are in use. Almost all the time, it is simply initialized to THIS_MODULE , a macro defined in <linux/module.h>.

owner指向绑定file_operations的模块。在大多时候，只需把THIS_MODULE赋给它即可。

参考资料：
Where is the memory allocation of “_thismodule” variable?；
深入淺出 insmod, #1。

Linux kernel 笔记（38）——”__user”修饰符

在kernel代码中，有时会看到函数声明中有的参数带有__user修饰符：

ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);

LDD给出的解释：

This annotation is a form of documentation, noting that a pointer is a user-space address that cannot be directly dereferenced. For normal compilation, __user has no effect, but it can be used by external checking software to find misuse of user-space addresses.

__user表明参数是一个用户空间的指针，不能在kernel代码中直接访问。也方便其它工具对代码进行检查。

调试Makefile的工具——remake

这两天在调试开源项目的一个Makefile，发现了remake这个工具（项目主页：http://bashdb.sourceforge.net/remake/），真的很好用。稍微大型点的开源项目，Makefile都很复杂，一旦出了错，很令人抓狂。而这个remake工具产生的输出能把整个编译过程的来龙去脉打印的很清楚。举例如下，一个简单的编译Linux模块的Makefile：

ifneq ($(KERNELRELEASE),)
        obj-m := hello.o
else
        KDIR ?= /lib/modules/`uname -r`/build
default:
        $(MAKE) -C $(KDIR) M=$$PWD
endif

执行remake -x命令：

# remake -x
Reading makefiles...
Updating goal targets....
 File 'default' does not exist.
Must remake target 'default'.
Makefile:8: target 'default' does not exist
##>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
remake -C /lib/modules/`uname -r`/build M=$PWD
##<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Reading makefiles...
Updating goal targets....
 File 'all' does not exist.
Must remake target 'all'.
remake[1]: Entering directory '/usr/src/linux-3.12.49-6-obj/x86_64/default'
Makefile:26: target 'all' does not exist
##>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
remake -C ../../../linux-3.12.49-6 O=/usr/src/linux-3.12.49-6-obj/x86_64/default/.
##<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Reading makefiles...
Updating goal targets....
 File '_all' does not exist.
   File 'sub-make' does not exist.
     File 'FORCE' does not exist.
    Must remake target 'FORCE'.
    Successfully remade target file 'FORCE'.
  Must remake target 'sub-make'.
Makefile:195: update target 'sub-make' due to: FORCE
##>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
echo "make[1]: Entering directory \`/usr/src/linux-3.12.49-6-obj/x86_64/default'"
##<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
make[1]: Entering directory `/usr/src/linux-3.12.49-6-obj/x86_64/default'
##>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
remake -C /usr/src/linux-3.12.49-6-obj/x86_64/default \
KBUILD_SRC=/usr/src/linux-3.12.49-6 \
KBUILD_EXTMOD="/root/Documents/test" -f /usr/src/linux-3.12.49-6/Makefile \

##<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
......

可以看到，输出了非常详细的日志信息，很方便debug。

libvirt笔记 (2) —— Hypervisor connections

Hypervisor connection是libvirt一个核心概念（以下内容摘自这里）：

A connection is the primary or top level object in the libvirt API and Python libvirt module. An instance of this object is required before attempting to use almost any of the classes or methods. A connection is associated with a particular hypervisor, which may be running locally on the same machine as the libvirt client application, or on a remote machine over the network. In all cases, the connection is represented by an instance of the virConnect class and identified by a URI. The URI scheme and path defines the hypervisor to connect to, while the host part of the URI determines where it is located.

An application is permitted to open multiple connections at the same time, even when using more than one type of hypervisor on a single machine. For example, a host may provide both KVM full machine virtualization and LXC container virtualization. A connection object may be used concurrently across multiple threads. Once a connection has been established, it is possible to obtain handles to other managed objects or create new managed objects.

以下代码测试Xen连接：

#!/usr/bin/python
from __future__ import print_function
import sys
import libvirt

conn = libvirt.open('xen:///')
if conn == None:
        print('Failed to open connection to xen:///', file=sys.stderr)
        exit(1)
else:
        print('Open connection success', file=sys.stdout)
        conn.close()
        exit(0)

libvirt笔记 (1) —— 术语

以下内容摘自Libvert terminology and goals：

a node is a single physical machine

an hypervisor is a layer of software allowing to virtualize a node in a set of virtual machines with possibly different configurations than the node itself

a domain is an instance of an operating system (or subsystem in the case of container virtualization) running on a virtualized machine provided by the hypervisor Hypervisor and domains running on a node

Now we can define the goal of libvirt: to provide a common and stable layer sufficient to securely manage domains on a node, possibly remote.

在libvirt中，node即指物理机器，domain可以理解为虚拟机。

一	二	三	四	五	六	日
« 12月
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31