SystemTap 笔记 (12)—— Associate array和foreach

SystemTap也提供associate array,并且必须是global变量:

global array_name[index_expression]


device[pid(),execname(),uid(),ppid(),"W"] = devname


if([index_expression] in array_name) statement


# cat test.stp
global reads

  reads[execname()] ++

probe timer.s(3)
  foreach (count in reads+)
    printf("%s : %d \n", count, reads[count])
  if(["stapio"] in reads) {
    printf("stapio read detected, exiting\n")

# ./test.stp
systemd-udevd : 4
gmain : 5
avahi-daemon : 11
stapio : 17
stapio read detected, exiting


delete removes an element.

The following statement removes from ARRAY the element specified by the index tuple. The value will no longer be available, and subsequent iterations will not report the element. It is not an error to delete an element that does not exist.

delete ARRAY[INDEX1, INDEX2, …]

The following syntax removes all elements from ARRAY:
delete ARRAY

foreach是输出associate array的一个重要方法:

General syntax:
foreach (VAR in ARRAY) STMT
The foreach statement loops over each element of a named global array, assigning the current key to VAR. The array must not be modified within the statement. If you add a single plus (+) or minus (-) operator after the VAR or the ARRAY identifier, the iteration order will be sorted by the ascending or descending index or value.

The following statement behaves the same as the first example, except it is used when an array is indexed with a tuple of keys. Use a sorting suffix on at most one VAR or ARRAY identifier.

foreach ([VAR1, VAR2, …] in ARRAY) STMT

You can combine the first and second syntax to capture both the full tuple and the keys at the same time as follows.

foreach (VALUE = [VAR1, VAR2, …] in ARRAY) STMT

The following statement is the same as the first example, except that the limit keyword limits the number of loop iterations to EXP times. EXP is evaluated once at the beginning of the loop.

foreach (VAR in ARRAY limit EXP) STMT

下面这个例子很好地解释了如何用foreach操作associate array

# cat test.stp
global reads 

    reads[pid(), execname()]++ 

probe timer.s(3) 
    foreach ([pid, name] in reads) 
        printf("%d %s: %d\n", pid, name, reads[pid, name]) 
    printf("======Another Total=====\n") 
    foreach (val = [pid, name] in reads) 
        printf("%d %s: %d\n", pid, name, val) 
    printf("======PID Ascending=====\n") 
    foreach ([pid+, name] in reads limit 3) 
        printf("%d %s: %d\n", pid, name, reads[pid, name]) 
    printf("======Count Descending=====\n") 
        foreach ([pid, name] in reads- limit 3) 
    printf("%d %s: %d\n", pid, name, reads[pid, name]) 


# ./test.stp
14367 stapio: 17
976 gmain: 5
791 avahi-daemon: 11
806 irqbalance: 16
14368 systemd-udevd: 1
432 systemd-udevd: 3
======Another Total=====
14367 stapio: 17
976 gmain: 5
791 avahi-daemon: 11
806 irqbalance: 16
14368 systemd-udevd: 1
432 systemd-udevd: 3
======PID Ascending=====
432 systemd-udevd: 3
791 avahi-daemon: 11
806 irqbalance: 16
======Count Descending=====
14367 stapio: 17
806 irqbalance: 16
791 avahi-daemon: 11

首先输出associate array中的所有内容(使用两种不同方式),接着分别按PID升序和associate array中的value降序输出前三个。


Haskell笔记 (2)—— if语句


The difference between Haskell’s if statement and if statements in imperative languages is that the else part is mandatory in Haskell.


doubleSmallNumber x = if x > 100
                      then x
                      else X * 2


parse error in if statement: missing required else clause


Another thing about the if statement in Haskell is that it is an expression. An expression is basically a piece of code that returns a value. 5 is an expression because it returns 5, 4 + 8 is an expression, x + y is an expression because it returns the sum of x and y. Because the else is mandatory, an if statement will always return something and that’s why it’s an expression.


Haskell笔记 (1)—— 函数


In Haskell, functions are called by writing the function name, a space and then the parameters, separated by spaces.


> min 9 10


If a function takes two parameters, we can also call it as an infix function by surrounding it with backticks.


> 9 `min` 10
> 9 min 10

    Non type-variable argument
      in the constraint: Num ((a -> a -> a) -> a -> t)
    (Use FlexibleContexts to permit this)
    When checking that ‘it’ has the inferred type
      it :: forall a a1 t.
            (Num a1, Num ((a -> a -> a) -> a1 -> t), Ord a) =>


Functions in Haskell don’t have to be in any particular order.


Side effects are essentially invisible inputs to, or outputs from, functions. In Haskell, the default is for functions to not have side effects: the result of a function depends only on the inputs that we explicitly provide. We call these functions pure; functions with side effects are impure.
If a function has side effects, we can tell by reading its type signature—the type of the function’s result will begin with IO:
ghci> :type readFile
readFile :: FilePath -> IO String
Haskell’s type system prevents us from accidentally mixing pure and impure code.


Haskell doesn’t have a return keyword, because a function is a single expression, not a sequence of statements. The value of the expression is the result of the function.







(3)SLES12 Xen Stack

• Xen 4.4.1  
• kernel 3.12.x  
• libvirt 1.2.5  
• virt-install 1.1.x, vm-install 1.x.x  
• virt-manager 1.1.x  



“Memory order”分析笔记

以下图片摘自Memory Reordering Caught in the Act,它描述了memory reorder问题:




为什么会发生memory reorder?一言以蔽之,因为性能。

在支持memory reorder的系统上,有以下3order需要考虑:

Program order: the order in which the memory operations are specified in the code running on a given CPU.

Execution order: the order in which the individual memory-reference instructions are executed on a given CPU. The execution order can differ from program order due to both compiler and CPU-implementation optimizations.

Perceived order: the order in which a given CPU perceives its and other CPUs’ memory operations. The perceived order can differ from the execution order due to caching, interconnect and memory-system optimizations. Different CPUs might well perceive the same memory operations as occurring in different orders.

Program order是代码里访问内存的顺序。Execution order是代码在CPU上实际执行的顺序,由于编译器优化和CPU的实现,实际指令执行的顺序有可能和代码顺序不一样。Perceived orderCPU用来“感知”自己或者其它CPU对内存操作,由于cachinginterconnect等原因,这个顺序有可能与代码实际的execution order不同。


关于memory order的总结:

A given CPU always perceives its own memory operations as occurring in program order. That is, memory-reordering issues arise only when a CPU is observing other CPUs’ memory operations.

An operation is reordered with a store only if the operation accesses a different location than does the store.

Aligned simple loads and stores are atomic.

Linux-kernel synchronization primitives contain any needed memory barriers, which is a good reason to use these primitives.

Memory Reordering Caught in the Act
Memory Ordering in Modern Microprocessors, Part I

Linux kernel 笔记 (53)——为什么“interrupt handler”不能被抢占?

Interrupt handler会复用当前被中断taskkernel stack,它并不是一个真正的task,也不拥有task_struct。因此一旦被调度出去,就无法再被调度回来继续执行。所以interrupt handler不允许被抢占。

Why can’t you sleep in an interrupt handler in the Linux kernel? Is this true of all OS kernels?
Why kernel code/thread executing in interrupt context cannot sleep?;
Are there any difference between “kernel preemption” and “interrupt”?;
Why can not processes switch in atomic context?


Linux kernel 笔记 (52)——使用“spinlock”的进程不能被抢占


A process cannot be preempted nor sleep while holding a spinlock due spinlocks behavior. If a process grabs a spinlock and goes to sleep before releasing it. A second process (or an interrupt handler) that to grab the spinlock will busy wait. On an uniprocessor machine the second process will lock the CPU not allowing the first process to wake up and release the spinlock so the second process can continue, it is basically a deadlock.

This happens since grabbing an spinlocks also disables interrupts and this is required to synchronize threads with interrupt handlers.

当一个task获得spinlock以后,它不能被抢占(比如调用sleep)。因为如果这时有另外一个task也想获得这个spinlock,在UP系统上,这个task就会一直占据CPU,并且不停地尝试获得锁。而第一个task没有机会重新执行来释放锁,这就造成“死锁”。Interrupt handler也是同样道理。


Linux kernel 笔记 (51)——”atomic context”


Kernel code generally runs in one of two fundamental contexts. Process context reigns when the kernel is running directly on behalf of a (usually) user-space process; the code which implements system calls is one example. When the kernel is running in process context, it is allowed to go to sleep if necessary. But when the kernel is running in atomic context, things like sleeping are not allowed. Code which handles hardware and software interrupts is one obvious example of atomic context.

There is more to it than that, though: any kernel function moves into atomic context the moment it acquires a spinlock. Given the way spinlocks are implemented, going to sleep while holding one would be a fatal error; if some other kernel function tried to acquire the same lock, the system would almost certainly deadlock forever.

“Deadlocking forever” tends not to appear on users’ wishlists for the kernel, so the kernel developers go out of their way to avoid that situation. To that end, code which is running in atomic context carefully follows a number of rules, including (1) no access to user space, and, crucially, (2) no sleeping. Problems can result, though, when a particular kernel function does not know which context it might be invoked in. The classic example is kmalloc() and friends, which take an explicit argument (GFPKERNEL or GFPATOMIC) specifying whether sleeping is possible or not.

处理中断代码属于atomic context,必须遵守下面的原则:
a)不能访问user space


Linux kernel 笔记 (50)——”context switch”和”mode switch”


At a high level, there are two separate mechanisms to understand. The first is the kernel entry/exit mechanism: this switches a single running thread from running usermode code to running kernel code in the context of that thread, and back again. The second is the context switch mechanism itself, which switches in kernel mode from running in the context of one thread to another.

So, when Thread A calls sched_yield() and is replaced by Thread B, what happens is:

Thread A enters the kernel, changing from user mode to kernel mode;
Thread A in the kernel context-switches to Thread B in the kernel;
Thread B exits the kernel, changing from kernel mode back to user mode.

Each user thread has both a user-mode stack and a kernel-mode stack. When a thread enters the kernel, the current value of the user-mode stack (SS:ESP) and instruction pointer (CS:EIP) are saved to the thread’s kernel-mode stack, and the CPU switches to the kernel-mode stack – with the int $80 syscall mechanism, this is done by the CPU itself. The remaining register values and flags are then also saved to the kernel stack.

When a thread returns from the kernel to user-mode, the register values and flags are popped from the kernel-mode stack, then the user-mode stack and instruction pointer values are restored from the saved values on the kernel-mode stack.

When a thread context-switches, it calls into the scheduler (the scheduler does not run as a separate thread – it always runs in the context of the current thread). The scheduler code selects a process to run next, and calls the switchto() function. This function essentially just switches the kernel stacks – it saves the current value of the stack pointer into the TCB for the current thread (called struct taskstruct in Linux), and loads a previously-saved stack pointer from the TCB for the next thread. At this point it also saves and restores some other thread state that isn’t usually used by the kernel – things like floating point/SSE registers.

So you can see that the core user-mode state of a thread isn’t saved and restored at context-switch time – it’s saved and restored to the thread’s kernel stack when you enter and leave the kernel. The context-switch code doesn’t have to worry about clobbering the user-mode register values – those are already safely saved away in the kernel stack by that point.

mode switch”是一个运行的taskuser-mode切换到kernel-mode,或者切换回来。而“context switch”一定发生在kernel mode,进行task的切换。

每个user task有一个user-mode stack和一个kernel-mode stack,当从user-mode切换到kernel-mode时,寄存器的值要保存到kernel-mode stack,反之,从kernel-mode切换回user-mode时,要把寄存器的值恢复出来。

进行“context switch”时,scheduler将当前kernel-mode stack中的值保存在task_struct中,并把下一个将要运行tasktask_struct值恢复到kernel-mode stack中。这样,从kernel-mode返回到user-mode,就会运行另外一个task


SystemTap 笔记 (11)—— 命令行参数


# cat test.stp

probe begin
        printf("arg1 is %d, arg2 is %s\n", $1, @2)


 # ./test.stp 100 "test"
arg1 is 100, arg2 is test

Command-Line Arguments