FreeBSD kernel 笔记(11)——condition variables

线程同步除了使用mutex,还可以使用conditional variables(下列内容摘自FreeBSD Device Drivers):

Condition variables synchronize the execution of two or more threads based upon the value of an object. In contrast, locks synchronize threads by controlling their access to objects.

Condition variables are used in conjunction with locks to “block” threads until a condition is true. It works like this: A thread first acquires the foo lock. Then it examines the condition. If the condition is false, it sleeps on the bar condition variable. While asleep on bar , threads relinquish foo . A thread that causes the condition to be true wakes up the threads sleeping on bar . Threads woken up in this manner reacquire foo before proceeding.

此外,使用conditional variables必然涉及到lock,以下是关于lock的规则(下列内容摘自FreeBSD Kernel Developer’s Manual):

The lock argument is a pointer to either mutex(9), rwlock(9), or sx(9) lock. A mutex(9) argument must be initialized with MTX_DEF and not MTX_SPIN. A thread must hold lock before calling cvwait(), cvwaitsig(), cvwaitunlock(), cvtimedwait(), or cvtimedwaitsig(). When a thread waits on a condition, lock is atomically released before the thread is blocked, then reacquired before the function call returns. In addition, the thread will fully drop the Giant mutex (even if recursed) while the it is suspended and will reacquire the Giant mutex before the function returns. The cvwaitunlock() function does not reacquire the lock before returning. Note that the Giant mutex may be specified as lock. However, Giant may not be used as lock for the cvwaitunlock() function. All waiters must pass the same lock in con- junction with cvp.

简而言之,即线程在调用cv_wait()等系列函数检查condition变成true时,它必须已经获得lock。在cv_wait()中,线程会先释放lock,然后阻塞在这里等待condition变成true,在从cv_wait()返回后,又重新获得lock。要注意,cv_wait_unlock()函数返回是不会重新获得lock

go-events package简介

go-events实现了一种处理event的机制,其核心概念是Sink(定义在event.go):

// Event marks items that can be sent as events.
type Event interface{}

// Sink accepts and sends events.
type Sink interface {
    // Write an event to the Sink. If no error is returned, the caller will
    // assume that all events have been committed to the sink. If an error is
    // received, the caller may retry sending the event.
    Write(event Event) error

    // Close the sink, possibly waiting for pending events to flush.
    Close() error
}

可以把Sink想象成一个“池子”,它提供了2个方法:Write往“池子”里发消息,Close是不用时关闭这个“池子”。 其它几个文件其实都是围绕Sink做文章,构造出各种功能。举个例子:

package main

import (
    "fmt"
    "github.com/docker/go-events"
    "time"
)

type eventRecv struct {
    name string
}

func (e *eventRecv)Write(event events.Event) error {
    fmt.Printf("%s receives %d\n", e.name, event.(int))
    return nil
}

func (e *eventRecv)Close() error {
    return nil
}

func createEventRecv(name string) *eventRecv {
    return &eventRecv{name}
}

func main() {
    e1 := createEventRecv("Foo")
    e2 := createEventRecv("Bar")

    bc := events.NewBroadcaster(e1, e2)
    bc.Write(1)
    bc.Write(2)
    time.Sleep(time.Second)
}

执行结果如下:

Foo receives 1
Bar receives 1
Foo receives 2
Bar receives 2

NewBroadcaster作用是把一个event发送到多个Sink

再看一个使用NewQueue的例子:

package main

import (
    "fmt"
    "github.com/docker/go-events"
    "time"
)

type eventRecv struct {
    name string
}

func (e *eventRecv)Write(event events.Event) error {
    fmt.Printf("%s receives %d\n", e.name, event)
    return nil
}

func (e *eventRecv)Close() error {
    return nil
}

func createEventRecv(name string) *eventRecv {
    return &eventRecv{name}
}

func main() {
    q := events.NewQueue(createEventRecv("Foo"))
    q.Write(1)
    q.Write(2)
    time.Sleep(time.Second)
}

执行结果如下:

Foo receives 1
Foo receives 2

FreeBSD kernel 笔记(10)——mutex

FreeBSD kernel提供两种mutexspin mutexsleep mutex(下列内容摘自 FreeBSD Device Drivers):

Spin Mutexes
Spin mutexes are simple spin locks. If a thread attempts to acquire a spin lock that is being held by another thread, it will “spin” and wait for the lock to be released. Spin, in this case, means to loop infinitely on the CPU. This spinning can result in deadlock if a thread that is holding a spin lock is interrupted or if it context switches, and all subsequent threads attempt to acquire that lock. Consequently, while holding a spin mutex all interrupts are blocked on the local processor and a context switch cannot be performed.

Spin mutexes should be held only for short periods of time and should be used only to protect objects related to nonpreemptive interrupts and low- level scheduling code (McKusick and Neville-Neil, 2005). Ordinarily, you’ll never use spin mutexes.

Sleep Mutexes
Sleep mutexes are the most commonly used lock. If a thread attempts to acquire a sleep mutex that is being held by another thread, it will context switch (that is, sleep) and wait for the mutex to be released. Because of this behavior, sleep mutexes are not susceptible to the deadlock described above.

Sleep mutexes support priority propagation. When a thread sleeps on a sleep mutex and its priority is higher than the sleep mutex’s current owner, the current owner will inherit the priority of this thread (Baldwin, 2002). This characteristic prevents a lower priority thread from blocking a higher priority thread.

NOTE Sleeping (for example, calling a *sleep function) while holding a mutex is never safe and must be avoided; otherwise, there are numerous assertions that will fail and the kernel will panic.

使用spin mutex时,为了防止deadlock,要把local cpu关中断并且不能进行context switch。通常情况下,应该使用sleep mutex。另外要注意,获得mutex的线程不能sleep,否则会导致kernel panic

此外,还有shared/exclusive locks

Shared/exclusive locks (sx locks) are locks that threads can hold while asleep. As the name implies, multiple threads can have a shared hold on an sx lock, but only one thread can have an exclusive hold on an sx lock. When a thread has an exclusive hold on an sx lock, other threads cannot have a shared hold on that lock.

sx locks do not support priority propagation and are inefficient com- pared to mutexes. The main reason for using sx locks is that threads can sleep while holding one.

reader/writer locks

Reader/writer locks (rw locks) are basically mutexes with sx lock semantics. Like sx locks, threads can hold rw locks as a reader, which is identical to a shared hold, or as a writer, which is identical to an exclusive hold. Like mutexes, rw locks support priority propagation and threads cannot hold them while sleeping (or the kernel will panic).

rw locks are used when you need to protect an object that is mostly going to be read from instead of written to.

shared/exclusive locksreader/writer locks语义类似,但有以下区别:拥有shared/exclusive locks的线程可以sleep,但不支持priority propagation;拥有reader/writer locks的线程不可以sleep,但支持priority propagation

Swarmkit笔记(13)——swarmctl通过controlClient向swarm cluster发命令

swarmctl实质上是通过controlClientswarm cluster发命令。controlClient定义在api/control.pb.go

// Client API for Control service

type ControlClient interface {
    GetNode(ctx context.Context, in *GetNodeRequest, opts ...grpc.CallOption) (*GetNodeResponse, error)
    ListNodes(ctx context.Context, in *ListNodesRequest, opts ...grpc.CallOption) (*ListNodesResponse, error)
    ......
}

type controlClient struct {
    cc *grpc.ClientConn
}

func NewControlClient(cc *grpc.ClientConn) ControlClient {
    return &controlClient{cc}
}

func (c *controlClient) GetNode(ctx context.Context, in *GetNodeRequest, opts ...grpc.CallOption) (*GetNodeResponse, error) {
    out := new(GetNodeResponse)
    err := grpc.Invoke(ctx, "/docker.swarmkit.v1.Control/GetNode", in, out, c.cc, opts...)
    if err != nil {
        return nil, err
    }
    return out, nil
}

......

 

docker笔记(17)——为image,container和Docker daemon加label

可以通过为imagecontainerDocker daemonlabel的方式(key=value格式)来存储metadata:比如licensevendor等等:

(1)为imagelabel,在Dockerfile中使用LABEL指令(尽量把所有的label放在1LABEL指令中,因为每一个LABEL指令都会为image增加一层layer):

LABEL [<namespace>.]<key>=<value> ...

(2)为containerlabel

docker run \
   -d \
   --label com.example.group="webservers" \
   --label com.example.environment="production" \
   busybox \
   top

(3)为Docker daemonlabel

docker daemon \
  --dns 8.8.8.8 \
  --dns 8.8.4.4 \
  -H unix:///var/run/docker.sock \
  --label com.example.environment="production" \
  --label com.example.storage="ssd"

参考资料:
Apply custom metadata

 

FreeBSD kernel 笔记(9)——modeventtype_t定义

modeventtype_t定义如下:

typedef enum modeventtype {
    MOD_LOAD,
    MOD_UNLOAD,
    MOD_SHUTDOWN,
    MOD_QUIESCE
} modeventtype_t;
typedef int (*modeventhand_t)(module_t, int /* modeventtype_t */, void *);

MOD_LOADMOD_UNLOADMOD_SHUTDOWN都好理解。分别是在加载,卸载模块,还有关机时传入模块处理函数的值。而关于MOD_QUIESCE可以参考FreeBSD Device Drivers

When one issues the kldunload(8) command, MOD_QUIESCE is run before MOD_UNLOAD . If MOD_QUIESCE returns an error, MOD_UNLOAD does not get executed. In other words, MOD_QUIESCE verifies that it is safe to unload your module.

NOTE The kldunload -f command ignores every error returned by MOD_QUIESCE . So you can always unload a module, but it may not be the best idea.

另外,关于MOD_QUIESCEMOD_SHUTDOWN区别,也可参考FreeBSD Kernel Developer’s Manual

The difference between MOD_QUIESCE and MOD_UNLOAD is that the module should fail MOD_QUIESCE if it is currently in use, whereas MOD_UNLOAD should only fail if it is impossible to unload the module, for instance because there are memory references to the module which cannot be revoked.

FreeBSD kernel 笔记(8)——双向链表

FreeBSD kernel提供了对双向链表的支持(定义在sys/sys/queue.h中):

/*
 * List declarations.
 */
#define LIST_HEAD(name, type)                       \
struct name {                               \
    struct type *lh_first;  /* first element */         \
}

#define LIST_CLASS_HEAD(name, type)                 \
struct name {                               \
    class type *lh_first;   /* first element */         \
}

#define LIST_HEAD_INITIALIZER(head)                 \
    { NULL }

#define LIST_ENTRY(type)                        \
struct {                                \
    struct type *le_next;   /* next element */          \
    struct type **le_prev;  /* address of previous next element */  \
}

#define LIST_CLASS_ENTRY(type)                      \
struct {                                \
    class type *le_next;    /* next element */          \
    class type **le_prev;   /* address of previous next element */  \
}

#define LIST_EMPTY(head)    ((head)->lh_first == NULL)

#define LIST_FIRST(head)    ((head)->lh_first)

#define LIST_FOREACH(var, head, field)                  \
    for ((var) = LIST_FIRST((head));                \
        (var);                          \
        (var) = LIST_NEXT((var), field))

#define LIST_NEXT(elm, field)   ((elm)->field.le_next)

#define LIST_INSERT_HEAD(head, elm, field) do {             \
    QMD_LIST_CHECK_HEAD((head), field);             \
    if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \
        LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\
    LIST_FIRST((head)) = (elm);                 \
    (elm)->field.le_prev = &LIST_FIRST((head));         \
} while (0)

......

FreeBSD Device Drivers代码为例:

(1)race_softc结构体定义:

struct race_softc {
    LIST_ENTRY(race_softc) list;
    int unit;
};

展开以后变成如下代码:

struct race_softc {
    struct { \
        struct race_softc *le_next; /* next element */          \
        struct race_softc **le_prev;    /* address of previous next element */  \
    } list;
    int unit;
};

(2)双向链表头定义:

static LIST_HEAD(, race_softc) race_list = LIST_HEAD_INITIALIZER(&race_list);

展开以后变成如下代码:

struct {struct race_softc *lh_first;} race_list = {NULL};

(3)插入一个元素:

sc = (struct race_softc *)malloc(sizeof(struct race_softc), M_RACE, M_WAITOK | M_ZERO);
sc->unit = unit;    
LIST_INSERT_HEAD(&race_list, sc, list);

展开以后变成如下代码:

sc = (struct race_softc *)malloc(sizeof(struct race_softc), M_RACE, M_WAITOK | M_ZERO);
sc->unit = unit;
do {                \
    QMD_LIST_CHECK_HEAD((race_list), list);             \
    if ((LIST_NEXT((sc), list) = LIST_FIRST((race_list))) != NULL)  \
        LIST_FIRST((race_list))->list.le_prev = &LIST_NEXT((sc), list);\
    LIST_FIRST((race_list)) = (sc);                 \
    (sc)->list.le_prev = &LIST_FIRST((race_list));          \
} while (0)

展开以后变成如下代码:

do { 
    if (((((sc))->list.le_next) = (((&race_list))->lh_first)) != ((void *)0)) (((&race_list))->lh_first)->list.le_prev = &(((sc))->list.le_next); 
    (((&race_list))->lh_first) = (sc); 
    (sc)->list.le_prev = &(((&race_list))->lh_first); 
} while (0);

即把元素插在链表头部。因为sc位于链表头部,所以其list.le_prev指向它自己。

FreeBSD kernel 笔记(7)——cdevsw结构体中定义不支持操作

下面摘自FreeBSD Device Drivers

If a d_foo function is undefined the corresponding operation is unsupported. However, dopen and dclose are unique; when they’re undefined the kernel will automatically define them as follows:
int
nullop(void)
{
return (0);
}
This ensures that every registered character device can be opened and closed.

即在cdevsw结构体中,d_opend_close是永远不为空的。

/*
 * Character device switch table
 */
struct cdevsw {
    int         d_version;
    u_int           d_flags;
    const char      *d_name;
    d_open_t        *d_open;
    d_fdopen_t      *d_fdopen;
    d_close_t       *d_close;
    d_read_t        *d_read;
    d_write_t       *d_write;
    d_ioctl_t       *d_ioctl;
    d_poll_t        *d_poll;
    d_mmap_t        *d_mmap;
    d_strategy_t        *d_strategy;
    dumper_t        *d_dump;
    d_kqfilter_t        *d_kqfilter;
    d_purge_t       *d_purge;
    d_mmap_single_t     *d_mmap_single;

    int32_t         d_spare0[3];
    void            *d_spare1[3];

    /* These fields should not be messed with by drivers */
    LIST_HEAD(, cdev)   d_devs;
    int         d_spare2;
    union {
        struct cdevsw       *gianttrick;
        SLIST_ENTRY(cdevsw) postfree_list;
    } __d_giant;
};

Swarmkit笔记(12)——swarmctl创建service时指定资源限制

swarmctl创建service时可以指定CPUmemory资源限制:

# swarmctl service create --help
Create a service

Usage:
  swarmctl service create [flags]

Flags:
......
  --cpu-limit string            CPU cores limit (e.g. 0.5)
  --cpu-reservation string      number of CPU cores reserved (e.g. 0.5)
......
  --memory-limit string         memory limit (e.g. 512m)
  --memory-reservation string   amount of reserved memory (e.g. 512m)
    ......

*-reservation的作用是为container分配并“占住”相应的资源,所以这些资源对container一定是可用的;*-limit是限制container进程所使用的资源。解析资源的代码位于cmd/swarmctl/service/flagparser/resource.go文件。

参考资料:
Docker service Limits and Reservations

docker笔记(16)——为container指定CPU资源

Docker run命令的--cpuset-cpus选项,指定container运行在特定的CPU core上。举例如下:

# docker run -ti --rm --cpuset-cpus=1,6 redis

另外还有一个--cpu-shares选项,它是一个相对权重(relative weight),其默认值是1024。即如果两个运行的container--cpu-shares值都是1024的话,则占用CPU资源的比例就相等。