Linux kernel 笔记 (45)——f_pos

f_pos定义在file结构体(定义在<linux/fs.h>),表示文件当前的读写位置:

struct file {
    ......
    loff_t          f_pos;
    ......
}

LDD3中关于f_pos的描述:

loff_t f_pos;

The current reading or writing position. loff_t is a 64-bit value on all platforms ( long long in gcc terminology). The driver can read this value if it needs to know the current position in the file but should not normally change it; read and write should update a position using the pointer they receive as the last argument instead of acting on filp->f_pos directly. The one exception to this rule is in the llseek method, the purpose of which is to change the file position.

驱动的读写操作不需要直接更新filp->f_pos。关于其中原因,可参考这篇笔记

 

Linux系统上“run”和“/var/run”目录

以下摘自wikipedia

Modern Linux distributions include a /run directory as a temporary filesystem (tmpfs) which stores volatile runtime data, following the FHS version 3.0. According to the FHS version 2.3, such data were stored in /var/run but this was a problem in some cases because this directory isn’t always available at early boot. As a result, these programs have had to resort to trickery, such as using /dev/.udev, /dev/.mdadm, /dev/.systemd or /dev/.mount directories, even though the device directory isn’t intended for such data.[19] Among other advantages, this makes the system easier to use normally with the root filesystem mounted read-only.

/run是一个临时文件系统,存储系统启动以来的信息。当系统重启时,这个目录下的文件应该被删掉或清除。如果你的系统上有/var/run目录,应该让它指向run。参看SuSE 12的实现:

# df -h
Filesystem      Size  Used Avail Use% Mounted on
......
tmpfs           431M  7.1M  424M   2% /run
......

# ls -lt /var/run
lrwxrwxrwx 1 root root 4 Nov  5 21:14 /var/run -> /run

 

开博两周年纪念

一眨眼,坚持写博客整整两年了。今天特发小文,总结一下。

这一年来博客发表的笔记偏多。原因是我这几年学习过很多知识,但是由于当时没有什么笔记,所以很多内容过一段时间就完全忘了,又要重头开始学。记一些笔记可以方便自己需要的时候能很快地把这些知识捡起来。如果又能恰巧帮助别人,则更好。

此外,一度中断的英文博客也重新开张了。毕竟英文是目前的“世界语言”,写一些英文文章可以更好地和世界朋友们进行交流。

就这样吧,期待博客可以一直写下去。 Keep moving!

 

SystemTap 笔记 (9)—— “?”和“!”

probe后面有时会跟?或是!字符:

kernel.function("no_such_function") ?
module("awol").function("no_such_function") !

对此,man手册的解释如下:

However, a probe point may be followed by a “?” character, to indicate that it is optional, and that no error should result if it fails to resolve. Optionalness passes down through all levels of alias/wildcard expansion. Alternately, a probe point may be followed by a “!” character, to indicate that it is both optional and sufficient. (Think vaguely of the prolog cut operator.) If it does resolve, then no further probe points in the same comma-separated list will be resolved. Therefore, the “!” sufficiency mark only makes sense in a list of probe point alternatives.

?表明probe是可选的,即使不存在相应的probe,也不会导致命令出错,而是继续解析其它的probe!表明probe一旦解析成功,则不会继续解析后面的probe。因此!只在存在probe列表的情况下才有效。

 

Crash工具笔记 (1)—— “current context”

成功启动crash会话后,会有一个task被指定为current context。因为有一些命令是context-sensitive,也即这些命令的运行会依赖于current context,所以知道当前的current context就很重要。

选择current context的标准:
a)coredump文件:

The task that was running when die() was called.
The task that was running when panic() was called.
The task that was running when an ALT-SYSRQ-c keyboard interrupt was received.
The task that was running when the character "c" was echoed to /proc/sysrq-trigger. 

b)当前运行的系统:

`crash`命令本身.

执行set命令显示当前current context

crash> set
    PID: 2366
COMMAND: "crash"
   TASK: ffff88001ae60000  [THREAD_INFO: ffff88001c1f0000]
    CPU: 0
  STATE: TASK_RUNNING (ACTIVE)

也可利用set命令改变当前current context

crash> set 1
    PID: 1
COMMAND: "systemd"
   TASK: ffff88001dfd8000  [THREAD_INFO: ffff88001dfe0000]
    CPU: 0
  STATE: TASK_INTERRUPTIBLE

 

/dev/mem,/dev/kmem和/dev/port

/dev/mem/dev/kmem/dev/port这三个文件分别代表物理内存,kernel虚拟内存和I/O端口。参考下面:

/dev/mem is a character device file that is an image of the main memory of the computer. It may be used, for example, to examine (and even patch) the system. Byte addresses in /dev/mem are interpreted as physical memory addresses. References to nonexistent locations cause errors to be returned.

The file /dev/kmem is the same as /dev/mem, except that the kernel virtual memory rather than physical memory is accessed.

/dev/port is similar to /dev/mem, but the I/O ports are accessed.

 

SystemTap 笔记 (8)—— typecasting

当指针是一个void *类型,或是保存为整数后,可以使用cast运算符指定指针的数据类型:

@cast(p, "type_name"[, "module"])->member

可选的module参数用来指定从哪里得到type_name(The optional module tells the translator where to look for information about that type. Multiple modules may be specified as a list with : separators. If the module is not specified, it will default either to the probe module for dwarf probes, or to “kernel” for functions and all other probes types.)。

另外,translator可以从头文件中创建type信息:

The translator can create its own module with type information from a header surrounded by angle brackets, in case normal debuginfo is not available. For kernel headers, prefix it with “kernel” to use the appropriate build system. All other headers are build with default GCC parameters into a user module. Multiple headers may be specified in sequence to resolve a codependency.

@cast(tv, “timeval”, “<sys/time.h>”)->tvsec
@cast(task, “task
struct”, “kernel<linux/sched.h>”)->tgid
@cast(task, “taskstruct”, “kernel<linux/sched.h><linux/fsstruct.h>”)->fs->umask

参考例子:

# stap -e 'probe kernel.function("do_dentry_open") {printf("%d\n", $f->f_flags); exit(); }'
32768

使用cast运算符:

# stap -e 'probe kernel.function("do_dentry_open") {printf("%d\n", @cast($f, "file", "kernel<linux/fs.h>" )->f_flags); exit(); }'
32768

SystemTap 笔记 (7)—— target variable (2)

SystemTap可以为target variable生成一系列可打印的字符串:

$$vars

Expands to a character string that is equivalent to sprintf(“parm1=%x … parmN=%x var1=%x … varN=%x”, parm1, …, parmN, var1, …, varN) for each variable in scope at the probe point. Some values may be printed as “=?” if their run-time location cannot be found.

$$locals

Expands to a subset of $$vars containing only the local variables.

$$parms

Expands to a subset of $$vars containing only the function parameters.

$$return

Is available in return probes only. It expands to a string that is equivalent to sprintf(“return=%x”, $return) if the probed function has a return value, or else an empty string.

参考下面例子:

# stap -e 'probe kernel.function("do_dentry_open") {printf("%s\n", $$vars); exit(); }'
f=0xffff880022ec6080 open=0x0 cred=0xffff880030d483c0 empty_fops={...} inode=? error=?
# stap -e 'probe kernel.function("do_dentry_open") {printf("%s\n", $$parms); exit(); }'
f=0xffff880030d453c0 open=0x0 cred=0xffff880030d483c0
# stap -e 'probe kernel.function("do_dentry_open") {printf("%s\n", $$locals); exit(); }'
empty_fops={...} inode=? error=?

在上述变量后加上$$$可以打印更详细的结构体信息。参考下例:

# stap -e 'probe kernel.function("do_dentry_open") {printf("%s\n", $$parms$); exit(); }'
f={.f_u={...}, .f_path={...}, .f_inode=0x0, .f_op=0x0, .f_lock={...}, .f_count={...}, .f_flags=32768, .f_mode=0, .f_pos=0, .f_owner={...}, .f_cred=0xffff880030d483c0, .f_ra={...}, .f_version=0, .f_security=0xffff880012982b80, .private_data=0x0, .f_ep_links={...}, .f_tfile_llink={...}, .f_mapping=0x0} open=<function>:0x0 cred={.usage={...}, .uid={...}, .gid={...}, .suid={...}, .sgid={...}, .euid={...}, .egid={...}, .fsuid={...}, .fsgid={...}, .securebits=0, .cap_inheritable={...}, .cap_permitted={...}, .cap
# stap -e 'probe kernel.function("do_dentry_open") {printf("%s\n", $$parms$$); exit(); }'
f={.f_u={.fu_llist={.next=0x0}, .fu_rcuhead={.next=0x0, .func=0x0}}, .f_path={.mnt=0xffff8800337510e0, .dentry=0xffff8800318b82d8}, .f_inode=0x0, .f_op=0x0, .f_lock={<union>={.rlock={.raw_lock={.head_tail=0, <class>={.tickets={.head=0, .tail=0}, .owner=0}}}}}, .f_count={.counter=1}, .f_flags=32768, .f_mode=0, .f_pos=0, .f_owner={.lock={.raw_lock={.lock=1048576, .write=1048576}}, .pid=0x0, .pid_type=0, .uid={.val=0}, .euid={.val=0}, .signum=0}, .f_cred=0xffff880030d89300, .f_ra={.start=0, .size=0, .async_si

 

Linux kernel 笔记 (44)——使用字符设备

Linux kernel 使用 cdev结构体代表字符设备(char device),定义在<linux/cdev.h>

#include <linux/kobject.h>
#include <linux/kdev_t.h>
#include <linux/list.h>

struct file_operations;
struct inode;
struct module;

struct cdev {
    struct kobject kobj;
    struct module *owner;
    const struct file_operations *ops;
    struct list_head list;
    dev_t dev;
    unsigned int count;
};

void cdev_init(struct cdev *, const struct file_operations *);

struct cdev *cdev_alloc(void);

void cdev_put(struct cdev *p);

int cdev_add(struct cdev *, dev_t, unsigned);

void cdev_del(struct cdev *);

void cd_forget(struct inode *);

分配和初始化cdev结构体的两种方式:

(1)

struct cdev *my_cdev = cdev_alloc();
my_cdev->ops = &my_fops;
my_cdev->owner = THIS_MODULE;

(2)另外一种是cdev嵌入到代表设备的结构体中:

struct scull_dev {
    ......
    struct cdev cdev; /* Char device structure */
    ......
};

static void scull_setup_cdev(struct scull_dev *dev, int index)
{
    int err, devno = MKDEV(scull_major, scull_minor + index);
    cdev_init(&dev->cdev, &scull_fops);
    dev->cdev.owner = THIS_MODULE;
    dev->cdev.ops = &scull_fops;
    ......
}

两种方式都要注意把owner赋值为THIS_MODULE

初始化cdev结构体以后,要使用cdev_add把设备加入系统:

/**
 * cdev_add() - add a char device to the system
 * @p: the cdev structure for the device
 * @dev: the first device number for which this device is responsible
 * @count: the number of consecutive minor numbers corresponding to this
 *         device
 *
 * cdev_add() adds the device represented by @p to the system, making it
 * live immediately.  A negative error code is returned on failure.
 */
int cdev_add(struct cdev *p, dev_t dev, unsigned count)
{
    int error;

    p->dev = dev;
    p->count = count;

    error = kobj_map(cdev_map, dev, count, NULL,
             exact_match, exact_lock, p);
    if (error)
        return error;

    kobject_get(p->kobj.parent);

    return 0;
}

要注意count指定的是连续的minor number数。

删除设备使用cdev_del函数:

/**
 * cdev_del() - remove a cdev from the system
 * @p: the cdev structure to be removed
 *
 * cdev_del() removes @p from the system, possibly freeing the structure
 * itself.
 */
void cdev_del(struct cdev *p)
{
    cdev_unmap(p->dev, p->count);
    kobject_put(&p->kobj);
}

2.6版本之前的注册和删除设备的register_chrdevunregister_chrdev函数已经过时,不再使用。

 

Linux kernel 笔记 (43)——do_sys_open

以下是do_sys_openkernel 3.12版本的代码:

long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode)
{
    struct open_flags op;
    int fd = build_open_flags(flags, mode, &op);
    struct filename *tmp;

    if (fd)
        return fd;

    tmp = getname(filename);
    if (IS_ERR(tmp))
        return PTR_ERR(tmp);

    fd = get_unused_fd_flags(flags);
    if (fd >= 0) {
        struct file *f = do_filp_open(dfd, tmp, &op);
        if (IS_ERR(f)) {
            put_unused_fd(fd);
            fd = PTR_ERR(f);
        } else {
            fsnotify_open(f);
            fd_install(fd, f);
        }
    }
    putname(tmp);
    return fd;
}

核心部分如下:

a)get_unused_fd_flags得到一个文件描述符;
b)do_filp_open得到一个struct file结构;
c)fd_install把文件描述符和struct file结构关联起来。

struct file包含f_op成员:

struct file {
    ......
    const struct file_operations    *f_op;
    ......
    void            *private_data;
    ......
}

struct file_operations又包含open成员:

struct file_operations {
    ......
    int (*open) (struct inode *, struct file *);
    ......
}

open成员的两个参数:实际文件的inode节点和struct file结构。

open系统调用执行驱动中open方法之前(struct file_operations中的open成员),会将private_data置成NULL,用户可以根据自己的需要设置private_data的值(参考do_dentry_open函数)。