技术 | 我的站点

Swarmkit笔记（15）——cluster node存储相关的代码

raft.Node有一个memoryStore成员（定义在manager/state/raft/raft.go）：

// Node represents the Raft Node useful
// configuration.
type Node struct {
    ......
    memoryStore         *store.MemoryStore
    ......
}

它非常重要，因为cluster中用来响应swarmctl命令的manager leader中的store成员其实就是指向manager中Node结构体中的memoryStore：

// Server is the Cluster API gRPC server.
type Server struct {
    store  *store.MemoryStore
    raft   *raft.Node
    rootCA *ca.RootCA
}

store.MemoryStore定义在manager/state/store/memory.go：

// MemoryStore is a concurrency-safe, in-memory implementation of the Store
// interface.
type MemoryStore struct {
    // updateLock must be held during an update transaction.
    updateLock sync.Mutex

    memDB *memdb.MemDB
    queue *watch.Queue

    proposer state.Proposer
}

其中实际用来存储的memory database部分使用的是go-memdb项目。初始化store.MemoryStore使用NewMemoryStore()函数：

// NewMemoryStore returns an in-memory store. The argument is an optional
// Proposer which will be used to propagate changes to other members in a
// cluster.
func NewMemoryStore(proposer state.Proposer) *MemoryStore {
    memDB, err := memdb.NewMemDB(schema)
    if err != nil {
        // This shouldn't fail
        panic(err)
    }

    return &MemoryStore{
        memDB:    memDB,
        queue:    watch.NewQueue(0),
        proposer: proposer,
    }
}

其中schema是一个*memdb.DBSchema类型的变量：

schema        = &memdb.DBSchema{
        Tables: map[string]*memdb.TableSchema{},
    }

往schema添加成员使用的是register函数（定义在manager/state/store/memory.go）：

func register(os ObjectStoreConfig) {
    objectStorers = append(objectStorers, os)
    schema.Tables[os.Name] = os.Table
}

register函数在store package里各个文件（分别是cluster.go，networks.go，nodes.go，services.go和tasks.go，正好对应swarmctl的5个子命令。）的init()函数中使用，用来注册如何处理相应的object。

ObjectStoreConfig定义在manager/state/store/object.go：

// ObjectStoreConfig provides the necessary methods to store a particular object
// type inside MemoryStore.
type ObjectStoreConfig struct {
    Name             string
    Table            *memdb.TableSchema
    Save             func(ReadTx, *api.StoreSnapshot) error
    Restore          func(Tx, *api.StoreSnapshot) error
    ApplyStoreAction func(Tx, *api.StoreAction) error
    NewStoreAction   func(state.Event) (api.StoreAction, error)
}

它定义了如何存储一个object。

以services.go为例：

const tableService = "service"

func init() {
    register(ObjectStoreConfig{
        Name: tableService,
        Table: &memdb.TableSchema{
            Name: tableService,
            Indexes: map[string]*memdb.IndexSchema{
                indexID: {
                    Name:    indexID,
                    Unique:  true,
                    Indexer: serviceIndexerByID{},
                },
                indexName: {
                    Name:    indexName,
                    Unique:  true,
                    Indexer: serviceIndexerByName{},
                },
            },
        },
        Save: func(tx ReadTx, snapshot *api.StoreSnapshot) error {
            var err error
            snapshot.Services, err = FindServices(tx, All)
            return err
        },
        Restore: func(tx Tx, snapshot *api.StoreSnapshot) error {
            services, err := FindServices(tx, All)
            if err != nil {
                return err
            }
            for _, s := range services {
                if err := DeleteService(tx, s.ID); err != nil {
                    return err
                }
            }
            for _, s := range snapshot.Services {
                if err := CreateService(tx, s); err != nil {
                    return err
                }
            }
            return nil
        },
        ApplyStoreAction: func(tx Tx, sa *api.StoreAction) error {
            switch v := sa.Target.(type) {
            case *api.StoreAction_Service:
                obj := v.Service
                switch sa.Action {
                case api.StoreActionKindCreate:
                    return CreateService(tx, obj)
                case api.StoreActionKindUpdate:
                    return UpdateService(tx, obj)
                case api.StoreActionKindRemove:
                    return DeleteService(tx, obj.ID)
                }
            }
            return errUnknownStoreAction
        },
        NewStoreAction: func(c state.Event) (api.StoreAction, error) {
            var sa api.StoreAction
            switch v := c.(type) {
            case state.EventCreateService:
                sa.Action = api.StoreActionKindCreate
                sa.Target = &api.StoreAction_Service{
                    Service: v.Service,
                }
            case state.EventUpdateService:
                sa.Action = api.StoreActionKindUpdate
                sa.Target = &api.StoreAction_Service{
                    Service: v.Service,
                }
            case state.EventDeleteService:
                sa.Action = api.StoreActionKindRemove
                sa.Target = &api.StoreAction_Service{
                    Service: v.Service,
                }
            default:
                return api.StoreAction{}, errUnknownStoreAction
            }
            return sa, nil
        },
    })
}

NewStoreAction是创建针对service这张table的api.StoreAction；而ApplyStoreAction则是根据具体情况，使用相应的action（create，update还是delete，等等）；Save是从数据库读取所有的service并保存到一个snapshot中；Restore则是用snapshot中的值更新数据库中相应的service。

再看一下manager leader用来创建service的函数（manager\controlapi\service.go）：

// CreateService creates and return a Service based on the provided ServiceSpec.
// - Returns `InvalidArgument` if the ServiceSpec is malformed.
// - Returns `Unimplemented` if the ServiceSpec references unimplemented features.
// - Returns `AlreadyExists` if the ServiceID conflicts.
// - Returns an error if the creation fails.
func (s *Server) CreateService(ctx context.Context, request *api.CreateServiceRequest) (*api.CreateServiceResponse, error) {
    ......
    err := s.store.Update(func(tx store.Tx) error {
        return store.CreateService(tx, service)
    })
    if err != nil {
        return nil, err
    }
    ......
}

s.store.Update()函数是核心部分（manager/state/store/memory.go）：

// Update executes a read/write transaction.
func (s *MemoryStore) Update(cb func(Tx) error) error {
    return s.update(s.proposer, cb)
}

再看一下MemoryStore.update()函数（manager/state/store/memory.go）：

func (s *MemoryStore) update(proposer state.Proposer, cb func(Tx) error) error {
    s.updateLock.Lock()
    memDBTx := s.memDB.Txn(true)

    var curVersion *api.Version

    if proposer != nil {
        curVersion = proposer.GetVersion()
    }

    var tx tx
    tx.init(memDBTx, curVersion)

    err := cb(&tx)

    if err == nil {
        if proposer == nil {
            memDBTx.Commit()
        } else {
            var sa []*api.StoreAction
            sa, err = tx.changelistStoreActions()

            if err == nil {
                if sa != nil {
                    err = proposer.ProposeValue(context.Background(), sa, func() {
                        memDBTx.Commit()
                    })
                } else {
                    memDBTx.Commit()
                }
            }
        }
    }

    if err == nil {
        for _, c := range tx.changelist {
            s.queue.Publish(c)
        }
        if len(tx.changelist) != 0 {
            s.queue.Publish(state.EventCommit{})
        }
    } else {
        memDBTx.Abort()
    }
    s.updateLock.Unlock()
    return err

}

分析一下上面这个函数：
（1）

memDBTx := s.memDB.Txn(true)

这是go-memdb的用法，true表明创建一个write transaction。

（2）

if proposer != nil {
    curVersion = proposer.GetVersion()
}

proposer是manager中raft.Node成员，其功能是用来通知cluster中其它follower manager所发生的变化：

// ProposeValue calls Propose on the raft and waits
// on the commit log action before returning a result
func (n *Node) ProposeValue(ctx context.Context, storeAction []*api.StoreAction, cb func()) error {
    _, err := n.processInternalRaftRequest(ctx, &api.InternalRaftRequest{Action: storeAction}, cb)
    if err != nil {
        return err
    }
    return nil
}

// GetVersion returns the sequence information for the current raft round.
func (n *Node) GetVersion() *api.Version {
    n.stopMu.RLock()
    defer n.stopMu.RUnlock()

    if !n.IsMember() {
        return nil
    }

    status := n.Node.Status()
    return &api.Version{Index: status.Commit}
}

（3）

var tx tx
tx.init(memDBTx, curVersion)

err := cb(&tx)

其中tx定义如下：

// Tx is a read/write transaction. Note that transaction does not imply
// any internal batching. The purpose of this transaction is to give the
// user a guarantee that its changes won't be visible to other transactions
// until the transaction is over.
type Tx interface {
    ReadTx
    create(table string, o Object) error
    update(table string, o Object) error
    delete(table, id string) error
}

type tx struct {
    readTx
    curVersion *api.Version
    changelist []state.Event
}

tx用来实现read/write transaction。

tx.init()就是一个“一对一”的赋值：

func (tx *tx) init(memDBTx *memdb.Txn, curVersion *api.Version) {
    tx.memDBTx = memDBTx
    tx.curVersion = curVersion
    tx.changelist = nil
}

cb就是：

func(tx store.Tx) error {
        return store.CreateService(tx, service)
}

store.CreateService()函数：

// CreateService adds a new service to the store.
// Returns ErrExist if the ID is already taken.
func CreateService(tx Tx, s *api.Service) error {
    // Ensure the name is not already in use.
    if tx.lookup(tableService, indexName, strings.ToLower(s.Spec.Annotations.Name)) != nil {
        return ErrNameConflict
    }

    return tx.create(tableService, serviceEntry{s})
}

以上代码确定service name没有重复后，再创建service：

// create adds a new object to the store.
// Returns ErrExist if the ID is already taken.
func (tx *tx) create(table string, o Object) error {
    if tx.lookup(table, indexID, o.ID()) != nil {
        return ErrExist
    }

    copy := o.Copy()
    meta := copy.Meta()
    if err := touchMeta(&meta, tx.curVersion); err != nil {
        return err
    }
    copy.SetMeta(meta)

    err := tx.memDBTx.Insert(table, copy)
    if err == nil {
        tx.changelist = append(tx.changelist, copy.EventCreate())
        o.SetMeta(meta)
    }
    return err
}

上面这个函数会创建一个Object副本（也就是serviceEntry结构体）存放到数据库里，并把一个state.EventCreateService加到tx.changelist中。

其实这些有callbak作为参数的函数，真正用来做事就是callback，函数的其它部分仅仅是提供了一些common的功能。比如：获得transaction和commit。

（4）

    if err == nil {
        if proposer == nil {
            memDBTx.Commit()
        } else {
            var sa []*api.StoreAction
            sa, err = tx.changelistStoreActions()

            if err == nil {
                if sa != nil {
                    err = proposer.ProposeValue(context.Background(), sa, func() {
                        memDBTx.Commit()
                    })
                } else {
                    memDBTx.Commit()
                }
            }
        }
    }

把数据commit到数据库。

（5）

    if err == nil {
        for _, c := range tx.changelist {
            s.queue.Publish(c)
        }
        if len(tx.changelist) != 0 {
            s.queue.Publish(state.EventCommit{})
        }
    } else {
        memDBTx.Abort()
    }

s.queue.Publish()函数把创建service这个消息通知到其它的goroutine（例如m.globalOrchestrator.Run()），这些goroutine会做具体的创建service操作。

此外，MemoryStore还提供了View函数，用来完成read transaction：

// ReadTx is a read transaction. Note that transaction does not imply
// any internal batching. It only means that the transaction presents a
// consistent view of the data that cannot be affected by other
// transactions.
type ReadTx interface {
    lookup(table, index, id string) Object
    get(table, id string) Object
    find(table string, by By, checkType func(By) error, appendResult func(Object)) error
}

type readTx struct {
    memDBTx *memdb.Txn
}

// View executes a read transaction.
func (s *MemoryStore) View(cb func(ReadTx)) {
    memDBTx := s.memDB.Txn(false)

    readTx := readTx{
        memDBTx: memDBTx,
    }
    cb(readTx)
    memDBTx.Commit()
}

RESTful Web 服务简介

本文是RESTful Web Services: A Tutorial的笔记：

REST stands for Representational State Transfer, which is an architectural style for networked hypermedia applications, it is primarily used to build Web services that are lightweight, maintainable, and scalable. A service based on REST is called a RESTful service. REST is not dependent on any protocol, but almost every RESTful service uses HTTP as its underlying protocol.

尽管REST本身不依赖任何协议，但是事实上RESTful服务基本都使用HTTP协议。

RESTful服务本身的特性：

Every system uses resources. These resources can be pictures, video files, Web pages, business information, or anything that can be represented in a computer-based system. The purpose of a service is to provide a window to its clients so that they can access these resources. Service architects and developers want this service to be easy to implement, maintainable, extensible, and scalable. A RESTful design promises that and more. In general, RESTful services should have following properties and features, which I’ll describe in detail:

Representations
Messages
URIs
Uniform interface
Stateless
Links between resources
Caching

具体来说：
Representations：使用JSON或XML；
Messages，URIs和Uniform interface：使用HTTP协议；
Stateless：很重要，不同的request之间不能有依赖。

Swarmkit笔记（14）——manager切换角色

Manager结构体（定义在manager/manager.go）包含一个*raft.Node成员：

// Manager is the cluster manager for Swarm.
// This is the high-level object holding and initializing all the manager
// subsystems.
type Manager struct {
    ......
    RaftNode   *raft.Node
    ......
}

而raft.Node（定义在manager/state/raft/raft.go）则包含一个*events.Broadcaster成员，用来接收改变manager role的消息（变成leader还是follower）：

// Node represents the Raft Node useful
// configuration.
type Node struct {
    ......
    leadershipBroadcast *events.Broadcaster
    ......
}

发送改变当前manager role的代码位于manager/state/raft/raft.go：

// Run is the main loop for a Raft node, it goes along the state machine,
// acting on the messages received from other Raft nodes in the cluster.
//
// Before running the main loop, it first starts the raft node based on saved
// cluster state. If no saved state exists, it starts a single-node cluster.
func (n *Node) Run(ctx context.Context) error {
    ......
            // If we cease to be the leader, we must cancel
            // any proposals that are currently waiting for
            // a quorum to acknowledge them. It is still
            // possible for these to become committed, but
            // if that happens we will apply them as any
            // follower would.
            if rd.SoftState != nil {
                if wasLeader && rd.SoftState.RaftState != raft.StateLeader {
                    wasLeader = false
                    n.wait.cancelAll()
                    if atomic.LoadUint32(&n.signalledLeadership) == 1 {
                        atomic.StoreUint32(&n.signalledLeadership, 0)
                        n.leadershipBroadcast.Write(IsFollower)
                    }
                } else if !wasLeader && rd.SoftState.RaftState == raft.StateLeader {
                    wasLeader = true
                }
            }

            if wasLeader && atomic.LoadUint32(&n.signalledLeadership) != 1 {
                // If all the entries in the log have become
                // committed, broadcast our leadership status.
                if n.caughtUp() {
                    atomic.StoreUint32(&n.signalledLeadership, 1)
                    n.leadershipBroadcast.Write(IsLeader)
                }
            }
    ......
}

接收消息的代码在Manager.Run()函数（manager/manager.go）:

// Run starts all manager sub-systems and the gRPC server at the configured
// address.
// The call never returns unless an error occurs or `Stop()` is called.
func (m *Manager) Run(parent context.Context) error {
    ......
    leadershipCh, cancel := m.RaftNode.SubscribeLeadership()
    defer cancel()

    go m.handleLeadershipEvents(ctx, leadershipCh)

    ......

    go func() {
            err := m.RaftNode.Run(ctx)
            if err != nil {
                log.G(ctx).Error(err)
                m.Stop(ctx)
            }
        }()

    ......
}

Node.SubscribeLeadership()和Manager.handleLeadershipEvents()代码比较简单，不再赘述。

FreeBSD kernel 笔记（11）——condition variables

线程同步除了使用mutex，还可以使用conditional variables（下列内容摘自FreeBSD Device Drivers）：

Condition variables synchronize the execution of two or more threads based upon the value of an object. In contrast, locks synchronize threads by controlling their access to objects.

Condition variables are used in conjunction with locks to “block” threads until a condition is true. It works like this: A thread first acquires the foo lock. Then it examines the condition. If the condition is false, it sleeps on the bar condition variable. While asleep on bar , threads relinquish foo . A thread that causes the condition to be true wakes up the threads sleeping on bar . Threads woken up in this manner reacquire foo before proceeding.

此外，使用conditional variables必然涉及到lock，以下是关于lock的规则（下列内容摘自FreeBSD Kernel Developer’s Manual）：

The lock argument is a pointer to either mutex(9), rwlock(9), or sx(9) lock. A mutex(9) argument must be initialized with MTX_DEF and not MTX_SPIN. A thread must hold lock before calling cvwait(), cvwaitsig(), cvwaitunlock(), cvtimedwait(), or cvtimedwaitsig(). When a thread waits on a condition, lock is atomically released before the thread is blocked, then reacquired before the function call returns. In addition, the thread will fully drop the Giant mutex (even if recursed) while the it is suspended and will reacquire the Giant mutex before the function returns. The cvwaitunlock() function does not reacquire the lock before returning. Note that the Giant mutex may be specified as lock. However, Giant may not be used as lock for the cvwaitunlock() function. All waiters must pass the same lock in con- junction with cvp.

简而言之，即线程在调用cv_wait()等系列函数检查condition变成true时，它必须已经获得lock。在cv_wait()中，线程会先释放lock，然后阻塞在这里等待condition变成true，在从cv_wait()返回后，又重新获得lock。要注意，cv_wait_unlock()函数返回是不会重新获得lock。

go-events package简介

go-events实现了一种处理event的机制，其核心概念是Sink（定义在event.go）：

// Event marks items that can be sent as events.
type Event interface{}

// Sink accepts and sends events.
type Sink interface {
    // Write an event to the Sink. If no error is returned, the caller will
    // assume that all events have been committed to the sink. If an error is
    // received, the caller may retry sending the event.
    Write(event Event) error

    // Close the sink, possibly waiting for pending events to flush.
    Close() error
}

可以把Sink想象成一个“池子”，它提供了2个方法：Write往“池子”里发消息，Close是不用时关闭这个“池子”。其它几个文件其实都是围绕Sink做文章，构造出各种功能。举个例子：

package main

import (
    "fmt"
    "github.com/docker/go-events"
    "time"
)

type eventRecv struct {
    name string
}

func (e *eventRecv)Write(event events.Event) error {
    fmt.Printf("%s receives %d\n", e.name, event.(int))
    return nil
}

func (e *eventRecv)Close() error {
    return nil
}

func createEventRecv(name string) *eventRecv {
    return &eventRecv{name}
}

func main() {
    e1 := createEventRecv("Foo")
    e2 := createEventRecv("Bar")

    bc := events.NewBroadcaster(e1, e2)
    bc.Write(1)
    bc.Write(2)
    time.Sleep(time.Second)
}

执行结果如下：

Foo receives 1
Bar receives 1
Foo receives 2
Bar receives 2

NewBroadcaster作用是把一个event发送到多个Sink。

再看一个使用NewQueue的例子：

package main

import (
    "fmt"
    "github.com/docker/go-events"
    "time"
)

type eventRecv struct {
    name string
}

func (e *eventRecv)Write(event events.Event) error {
    fmt.Printf("%s receives %d\n", e.name, event)
    return nil
}

func (e *eventRecv)Close() error {
    return nil
}

func createEventRecv(name string) *eventRecv {
    return &eventRecv{name}
}

func main() {
    q := events.NewQueue(createEventRecv("Foo"))
    q.Write(1)
    q.Write(2)
    time.Sleep(time.Second)
}

执行结果如下：

Foo receives 1
Foo receives 2

FreeBSD kernel 笔记（10）——mutex

FreeBSD kernel提供两种mutex：spin mutex和sleep mutex（下列内容摘自 FreeBSD Device Drivers）：

Spin Mutexes
Spin mutexes are simple spin locks. If a thread attempts to acquire a spin lock that is being held by another thread, it will “spin” and wait for the lock to be released. Spin, in this case, means to loop infinitely on the CPU. This spinning can result in deadlock if a thread that is holding a spin lock is interrupted or if it context switches, and all subsequent threads attempt to acquire that lock. Consequently, while holding a spin mutex all interrupts are blocked on the local processor and a context switch cannot be performed.

Spin mutexes should be held only for short periods of time and should be used only to protect objects related to nonpreemptive interrupts and low- level scheduling code (McKusick and Neville-Neil, 2005). Ordinarily, you’ll never use spin mutexes.

Sleep Mutexes
Sleep mutexes are the most commonly used lock. If a thread attempts to acquire a sleep mutex that is being held by another thread, it will context switch (that is, sleep) and wait for the mutex to be released. Because of this behavior, sleep mutexes are not susceptible to the deadlock described above.

Sleep mutexes support priority propagation. When a thread sleeps on a sleep mutex and its priority is higher than the sleep mutex’s current owner, the current owner will inherit the priority of this thread (Baldwin, 2002). This characteristic prevents a lower priority thread from blocking a higher priority thread.

NOTE Sleeping (for example, calling a *sleep function) while holding a mutex is never safe and must be avoided; otherwise, there are numerous assertions that will fail and the kernel will panic.

使用spin mutex时，为了防止deadlock，要把local cpu关中断并且不能进行context switch。通常情况下，应该使用sleep mutex。另外要注意，获得mutex的线程不能sleep，否则会导致kernel panic。

此外，还有shared/exclusive locks：

Shared/exclusive locks (sx locks) are locks that threads can hold while asleep. As the name implies, multiple threads can have a shared hold on an sx lock, but only one thread can have an exclusive hold on an sx lock. When a thread has an exclusive hold on an sx lock, other threads cannot have a shared hold on that lock.

sx locks do not support priority propagation and are inefficient com- pared to mutexes. The main reason for using sx locks is that threads can sleep while holding one.

和reader/writer locks：

Reader/writer locks (rw locks) are basically mutexes with sx lock semantics. Like sx locks, threads can hold rw locks as a reader, which is identical to a shared hold, or as a writer, which is identical to an exclusive hold. Like mutexes, rw locks support priority propagation and threads cannot hold them while sleeping (or the kernel will panic).

rw locks are used when you need to protect an object that is mostly going to be read from instead of written to.

shared/exclusive locks和reader/writer locks语义类似，但有以下区别：拥有shared/exclusive locks的线程可以sleep，但不支持priority propagation；拥有reader/writer locks的线程不可以sleep，但支持priority propagation。

Swarmkit笔记（13）——swarmctl通过controlClient向swarm cluster发命令

swarmctl实质上是通过controlClient向swarm cluster发命令。controlClient定义在api/control.pb.go：

// Client API for Control service

type ControlClient interface {
    GetNode(ctx context.Context, in *GetNodeRequest, opts ...grpc.CallOption) (*GetNodeResponse, error)
    ListNodes(ctx context.Context, in *ListNodesRequest, opts ...grpc.CallOption) (*ListNodesResponse, error)
    ......
}

type controlClient struct {
    cc *grpc.ClientConn
}

func NewControlClient(cc *grpc.ClientConn) ControlClient {
    return &controlClient{cc}
}

func (c *controlClient) GetNode(ctx context.Context, in *GetNodeRequest, opts ...grpc.CallOption) (*GetNodeResponse, error) {
    out := new(GetNodeResponse)
    err := grpc.Invoke(ctx, "/docker.swarmkit.v1.Control/GetNode", in, out, c.cc, opts...)
    if err != nil {
        return nil, err
    }
    return out, nil
}

......

docker笔记（17）——为image，container和Docker daemon加label

可以通过为image，container和Docker daemon加label的方式（key=value格式）来存储metadata：比如license，vendor等等：

（1）为image加label，在Dockerfile中使用LABEL指令（尽量把所有的label放在1条LABEL指令中，因为每一个LABEL指令都会为image增加一层layer）：

LABEL [<namespace>.]<key>=<value> ...

（2）为container加label：

docker run \
   -d \
   --label com.example.group="webservers" \
   --label com.example.environment="production" \
   busybox \
   top

（3）为Docker daemon加label：

docker daemon \
  --dns 8.8.8.8 \
  --dns 8.8.4.4 \
  -H unix:///var/run/docker.sock \
  --label com.example.environment="production" \
  --label com.example.storage="ssd"

参考资料：
Apply custom metadata。

FreeBSD kernel 笔记（9）——modeventtype_t定义

modeventtype_t定义如下：

typedef enum modeventtype {
    MOD_LOAD,
    MOD_UNLOAD,
    MOD_SHUTDOWN,
    MOD_QUIESCE
} modeventtype_t;
typedef int (*modeventhand_t)(module_t, int /* modeventtype_t */, void *);

MOD_LOAD，MOD_UNLOAD和MOD_SHUTDOWN都好理解。分别是在加载，卸载模块，还有关机时传入模块处理函数的值。而关于MOD_QUIESCE可以参考FreeBSD Device Drivers：

When one issues the kldunload(8) command, MOD_QUIESCE is run before MOD_UNLOAD . If MOD_QUIESCE returns an error, MOD_UNLOAD does not get executed. In other words, MOD_QUIESCE verifies that it is safe to unload your module.

NOTE The kldunload -f command ignores every error returned by MOD_QUIESCE . So you can always unload a module, but it may not be the best idea.

另外，关于MOD_QUIESCE和MOD_SHUTDOWN区别，也可参考FreeBSD Kernel Developer’s Manual：

The difference between MOD_QUIESCE and MOD_UNLOAD is that the module should fail MOD_QUIESCE if it is currently in use, whereas MOD_UNLOAD should only fail if it is impossible to unload the module, for instance because there are memory references to the module which cannot be revoked.

FreeBSD kernel 笔记（8）——双向链表

FreeBSD kernel提供了对双向链表的支持（定义在sys/sys/queue.h中）：

/*
 * List declarations.
 */
#define LIST_HEAD(name, type)                       \
struct name {                               \
    struct type *lh_first;  /* first element */         \
}

#define LIST_CLASS_HEAD(name, type)                 \
struct name {                               \
    class type *lh_first;   /* first element */         \
}

#define LIST_HEAD_INITIALIZER(head)                 \
    { NULL }

#define LIST_ENTRY(type)                        \
struct {                                \
    struct type *le_next;   /* next element */          \
    struct type **le_prev;  /* address of previous next element */  \
}

#define LIST_CLASS_ENTRY(type)                      \
struct {                                \
    class type *le_next;    /* next element */          \
    class type **le_prev;   /* address of previous next element */  \
}

#define LIST_EMPTY(head)    ((head)->lh_first == NULL)

#define LIST_FIRST(head)    ((head)->lh_first)

#define LIST_FOREACH(var, head, field)                  \
    for ((var) = LIST_FIRST((head));                \
        (var);                          \
        (var) = LIST_NEXT((var), field))

#define LIST_NEXT(elm, field)   ((elm)->field.le_next)

#define LIST_INSERT_HEAD(head, elm, field) do {             \
    QMD_LIST_CHECK_HEAD((head), field);             \
    if ((LIST_NEXT((elm), field) = LIST_FIRST((head))) != NULL) \
        LIST_FIRST((head))->field.le_prev = &LIST_NEXT((elm), field);\
    LIST_FIRST((head)) = (elm);                 \
    (elm)->field.le_prev = &LIST_FIRST((head));         \
} while (0)

......

以FreeBSD Device Drivers代码为例：

（1）race_softc结构体定义：

struct race_softc {
    LIST_ENTRY(race_softc) list;
    int unit;
};

展开以后变成如下代码：

struct race_softc {
    struct { \
        struct race_softc *le_next; /* next element */          \
        struct race_softc **le_prev;    /* address of previous next element */  \
    } list;
    int unit;
};

（2）双向链表头定义：

static LIST_HEAD(, race_softc) race_list = LIST_HEAD_INITIALIZER(&race_list);

展开以后变成如下代码：

struct {struct race_softc *lh_first;} race_list = {NULL}；

（3）插入一个元素：

sc = (struct race_softc *)malloc(sizeof(struct race_softc), M_RACE, M_WAITOK | M_ZERO);
sc->unit = unit;    
LIST_INSERT_HEAD(&race_list, sc, list);

展开以后变成如下代码：

sc = (struct race_softc *)malloc(sizeof(struct race_softc), M_RACE, M_WAITOK | M_ZERO);
sc->unit = unit;
do {                \
    QMD_LIST_CHECK_HEAD((race_list), list);             \
    if ((LIST_NEXT((sc), list) = LIST_FIRST((race_list))) != NULL)  \
        LIST_FIRST((race_list))->list.le_prev = &LIST_NEXT((sc), list);\
    LIST_FIRST((race_list)) = (sc);                 \
    (sc)->list.le_prev = &LIST_FIRST((race_list));          \
} while (0)

展开以后变成如下代码：

do { 
    if (((((sc))->list.le_next) = (((&race_list))->lh_first)) != ((void *)0)) (((&race_list))->lh_first)->list.le_prev = &(((sc))->list.le_next); 
    (((&race_list))->lh_first) = (sc); 
    (sc)->list.le_prev = &(((&race_list))->lh_first); 
} while (0);

即把元素插在链表头部。因为sc位于链表头部，所以其list.le_prev指向它自己。

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30