Linux kernel 笔记 (47)——操作信号量的函数

操作信号量的函数如下:

#include <linux/semaphore.h>
void down(struct semaphore *sem);
int down_interruptible(struct semaphore *sem);
int down_killable(struct semaphore *sem);
int down_trylock(struct semaphore *sem); 
int down_timeout(struct semaphore *sem, long jiffies);
void up(struct semaphore *sem);

down已经不再推荐使用。

down_interruptible可以被信号打断,因此需要检查返回值:只有返回0,才表明成功获取了信号量。使用down_interruptible例子如下:

if (down_interruptible(&sem)) return -ERESTARTSYS;

down_killable只能被fatal信号打断,这种信号通常用来终止进程,因此down_killable用了保证用户进程可以被杀死,否则一旦有死锁进程,则只能重启系统。

down_trylock是非阻塞版本的down,也要检查返回值。举例如下:

if (file->f_flags & O_NONBLOCK) {
    if (down_trylock(&iosem)) return -EAGAIN;
} else {
    if (down_interruptible(&iosem)) return -ERESTARTSYS;
}

down_timeout用来等待一段时间,中间也不能被信号打断。

up用来释放信号量,不需要提供interrupt版本。

参考资料:
Mutex, semaphore and the proc file system

 

Linux kernel 笔记 (46)——配置crashkernel参数

crashkernel用来配置Kexec启动的第二个kernelcrash kernel),即用来捕获第一个kernel crash dumpkernel的大小和位置。 配置crashkernel参数有四种形式:

(1)

crashkernel=size[@offset]  

保留[offset,offset + size]这段内存,如果@offset省略,则会自动选择一个合适的offset
(2)

crashkernel=range1:size1[,range2:size2,...][@offset]
range=start-[end](包含`start`,但不包含`end`)

举例来看:

crashkernel=512M-2G:64M,2G-:128M

含义如下:
a)如果内存小于512M,则不保留内存;
b)内存介于512M2G之间,保留64M内存;
c)内存2G以上,保留128M内存。

(3)

crashkernel=size,high

只用于X86_64平台。当内存大于4G时,允许kerneltop,也就是高于4G的内存地址开始分配。如果内存小于4G,则自然从低于4G的地址空间分配。如果指定crashkernel=size,则这个选项会被忽略。

(4)

crashkernel=size,low

只用于X86_64平台。当指定crashkernel=size,high时,也需要在low,也就是低于4G的内存地址分配一段内存。默认情况下,系统会尝试自动分配至少256M内存。

参考资料:
Kernel Parameters

 

Crash工具笔记 (2)—— 打印运行“crash”命令的调试信息

使用-d number可以打印运行crash命令时,输出的调试信息。number越大,输出的信息越多。目前-d8可以打印所有的调试信息。举例如下:

# crash -d8

crash 7.0.2-6.el7
Copyright (C) 2002-2013  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.


find_booted_kernel: search for [Linux version 3.10.0-123.el7.x86_64.debug (mockbuild@x86-017.build.eng.bos.redhat.com) (gcc version 4.8.2 20
140120 (Red Hat 4.8.2-16) (GCC) ) #1 SMP Mon May 5 11:24:18 EDT 2014]
mount_points[0]: / (167c600)
mount_points[1]: /proc (167c620)
mount_points[2]: /sys (167c640)
mount_points[3]: /dev (167c660)
mount_points[4]: /sys/kernel/security (167c680)
mount_points[5]: /dev/shm (167c6b0)
mount_points[6]: /dev/pts (167c6d0)
mount_points[7]: /run (167c6f0)
mount_points[8]: /sys/fs/cgroup (167c710)
mount_points[9]: /sys/fs/cgroup/systemd (167c740)
mount_points[10]: /sys/fs/pstore (167c780)
mount_points[11]: /sys/fs/cgroup/cpuset (167c7b0)
mount_points[12]: /sys/fs/cgroup/cpu,cpuacct (167c7f0)
mount_points[13]: /sys/fs/cgroup/memory (167c830)
......

 

Xen 笔记 (2)——”xen”,”xen-syms” and “xen-*-dbg”文件

以下摘自Xen邮件列表

Hi all,

Since I am a newbie of Xen, this may be a dumb question. On my SuSE Xen, I find the following 4 “Xen” files:

-rw-r–r– 1 root rootÂÂÂ881967 Oct 27 05:46 xen-4.5.110-1.gz
-rw-r–r– 1 root root 16673080 Oct 27 05:46 xen-syms-4.5.1
10-1
-rw-r–r– 1 root rootÂÂÂ893428 Oct 27 05:45 xen-dbg-4.5.110-1.gz
-rw-r–r– 1 root root 16124144 Oct 27 05:45 xen-syms-dbg-4.5.1
10-1

Could anyone give a detailed explanation of the functions and differences about the 4 files?

TL;DR: Use xen-04.5.1_1-1.gz unless someone asks you to do otherwise or you are tracking down a bug.

xen.gz is the main/regular/normal release build of the Xen binary (e.g. the thing which you can boot). This is the thing you would boot on a regular production Xen system.

xen-syms.gz is the unstripped version of xen.gz, i.e. with all the ELF debug information present. You can e.g. run gdb on it to disassemble things if you are tracking down a bug. This image is not bootable.

The -dbg variants are not something which upstream produces, but I would assume that they are build with the debug=y, which means they will contain extra ASSERT statements and other things which aid debugging possibly at the expense of performance. Release builds (xen.gz et al) are build with debug=n. Like the release builds the xen-*-dbg ones come in the bootable (xen-dbg.gz) and xen-syms-dbg.gz variants, with the same distinction (the former is bootable, the latter is for running gdb on)

概括一下,xen文件是可以bootable的,也是用来启动xen系统的可执行文件。xen-symxen的保留所有调试信息的的版本,不可以bootable,仅用于调试功能。xen-*-dbg则是xenxen-sym对应的加了很多调试代码的版本。

 

kmod简介

kmod提供了一组操作Linux kernel module的工具,它是构建在libkmod库之上的(这个库也随kmod源码一并提供)。代码地址:http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/

SuSE Linux执行如下命令:

/sbin # ls -alt | grep kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 depmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 insmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 lsmod -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 modinfo -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 modprobe -> /usr/bin/kmod
lrwxrwxrwx 1 root root        13 Nov  5 21:17 rmmod -> /usr/bin/kmod

可以看到,平时常用的insmodmodprobe等命令本质上调用的都是kmod命令。

 

Linux kernel 笔记 (45)——f_pos

f_pos定义在file结构体(定义在<linux/fs.h>),表示文件当前的读写位置:

struct file {
    ......
    loff_t          f_pos;
    ......
}

LDD3中关于f_pos的描述:

loff_t f_pos;

The current reading or writing position. loff_t is a 64-bit value on all platforms ( long long in gcc terminology). The driver can read this value if it needs to know the current position in the file but should not normally change it; read and write should update a position using the pointer they receive as the last argument instead of acting on filp->f_pos directly. The one exception to this rule is in the llseek method, the purpose of which is to change the file position.

驱动的读写操作不需要直接更新filp->f_pos。关于其中原因,可参考这篇笔记

 

Linux系统上“run”和“/var/run”目录

以下摘自wikipedia

Modern Linux distributions include a /run directory as a temporary filesystem (tmpfs) which stores volatile runtime data, following the FHS version 3.0. According to the FHS version 2.3, such data were stored in /var/run but this was a problem in some cases because this directory isn’t always available at early boot. As a result, these programs have had to resort to trickery, such as using /dev/.udev, /dev/.mdadm, /dev/.systemd or /dev/.mount directories, even though the device directory isn’t intended for such data.[19] Among other advantages, this makes the system easier to use normally with the root filesystem mounted read-only.

/run是一个临时文件系统,存储系统启动以来的信息。当系统重启时,这个目录下的文件应该被删掉或清除。如果你的系统上有/var/run目录,应该让它指向run。参看SuSE 12的实现:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
......
tmpfs           431M  7.1M  424M   2% /run
......

# ls -lt /var/run
lrwxrwxrwx 1 root root 4 Nov  5 21:14 /var/run -> /run

 

开博两周年纪念

一眨眼,坚持写博客整整两年了。今天特发小文,总结一下。

这一年来博客发表的笔记偏多。原因是我这几年学习过很多知识,但是由于当时没有什么笔记,所以很多内容过一段时间就完全忘了,又要重头开始学。记一些笔记可以方便自己需要的时候能很快地把这些知识捡起来。如果又能恰巧帮助别人,则更好。

此外,一度中断的英文博客也重新开张了。毕竟英文是目前的“世界语言”,写一些英文文章可以更好地和世界朋友们进行交流。

就这样吧,期待博客可以一直写下去。 Keep moving!

 

SystemTap 笔记 (9)—— “?”和“!”

probe后面有时会跟?或是!字符:

kernel.function("no_such_function") ?
module("awol").function("no_such_function") !

对此,man手册的解释如下:

However, a probe point may be followed by a “?” character, to indicate that it is optional, and that no error should result if it fails to resolve. Optionalness passes down through all levels of alias/wildcard expansion. Alternately, a probe point may be followed by a “!” character, to indicate that it is both optional and sufficient. (Think vaguely of the prolog cut operator.) If it does resolve, then no further probe points in the same comma-separated list will be resolved. Therefore, the “!” sufficiency mark only makes sense in a list of probe point alternatives.

?表明probe是可选的,即使不存在相应的probe,也不会导致命令出错,而是继续解析其它的probe!表明probe一旦解析成功,则不会继续解析后面的probe。因此!只在存在probe列表的情况下才有效。

 

Crash工具笔记 (1)—— “current context”

成功启动crash会话后,会有一个task被指定为current context。因为有一些命令是context-sensitive,也即这些命令的运行会依赖于current context,所以知道当前的current context就很重要。

选择current context的标准:
a)coredump文件:

The task that was running when die() was called.
The task that was running when panic() was called.
The task that was running when an ALT-SYSRQ-c keyboard interrupt was received.
The task that was running when the character "c" was echoed to /proc/sysrq-trigger. 

b)当前运行的系统:

`crash`命令本身.

执行set命令显示当前current context

crash> set
    PID: 2366
COMMAND: "crash"
   TASK: ffff88001ae60000  [THREAD_INFO: ffff88001c1f0000]
    CPU: 0
  STATE: TASK_RUNNING (ACTIVE)

也可利用set命令改变当前current context

crash> set 1
    PID: 1
COMMAND: "systemd"
   TASK: ffff88001dfd8000  [THREAD_INFO: ffff88001dfe0000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE