技术 | 我的站点

“Emulation”和“Hardware virtualization”的比较

“Emulation”是软件模拟硬件，即在你当前的host机器（举个例子：X86）上运行另一个SPARC平台的的虚拟机。软件需要把SPARC平台指令转化为X86指令，所以速度很慢。（qemu-system-x86_64）

“Hardware virtualization”是硬件支持虚拟化。即软件直接利用CPU和芯片组等硬件。由于没有指令转化，所以速度很快。当然，host机器和虚拟机是同样的指令集。（qemu-system-x86_64 -enable-kvm）

参考资料：
qemu-kvm or qemu-system-x86_64？

git branch & merge笔记

In fact, in Git, the act of creating a new branch is simply writing a file in the .git/refs/heads directory that has the SHA-1 of the last commit for that branch.

Creating a branch is nothing more than just writing 40 characters to a file.

Switching to that branch simply means having Git make your working directory look like the tree that SHA-1 points to and updating the HEAD file so each commit from that point on moves that branch pointer forward (in other words, it changes the 40 characters in .git/refs/heads/[current_branch_name] be the SHA-1 of your last commit).

可以看到，在git中，创建一个branch仅仅是在一个文件中加入40个字符的SHA-1值。

Remotes are basically pointers to branches in other peoples copies of the same repository, often on other computers. If you got your repository by cloning it, rather than initializing it, you should have a remote branch of where you copied it from automatically added as origin by default. Which means the tree that was checked out during your initial clone would be referenced as origin/master , which means “the master branch of the origin remote.”

Remotes是指向其它人关于这个repository copy里某个branch的指针。如果你的repository是通过clone其它copy得到的，而不是initialize的，在你的repository里，会自动产生一个origin/master的remote branch指向你copy的repository tree。

参考资料：
Git internals。

Linux现在支持多个PCI domain，每个PCI domain管理256个bus（2^8），每个bus上可以挂载32（2^5）个设备，每个设备最多支持8（2^3）个function。同一PCI bus上的设备共享memory location和I/O port的地址空间，而configuration register则不是。每个PCI设备的configuration register占256个字节。

执行“lspci -D”命令显示当前系统的PCI设备信息。格式为：domain:bus:dev:func：

[root@localhost ~]# lspci -D
0000:00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
0000:00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
0000:00:01.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:02.0 VGA compatible controller: InnoTek Systemberatung GmbH VirtualBox Graphics Adapter
0000:00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 02)
0000:00:04.0 System peripheral: InnoTek Systemberatung GmbH VirtualBox Guest Service
0000:00:05.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
0000:00:06.0 USB controller: Apple Inc. KeyLargo/Intrepid USB
0000:00:07.0 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 08)
0000:00:0b.0 USB controller: Intel Corporation 82801FB/FBM/FR/FW/FRW (ICH6 Family) USB2 EHCI Controller
0000:00:0d.0 SATA controller: Intel Corporation 82801HM/HEM (ICH8M/ICH8M-E) SATA Controller [AHCI mode] (rev 02)

如何组织一场技术沙龙？

从去年到现在，我参加了几次技术沙龙。其中有组织的很好的，也有不尽如人意的。在这篇文章里，我以举办一次Go语言话题分享为例，谈谈我个人觉得该如何办好一场技术沙龙。

（1）组织

首先说一下沙龙的组织者与参与者之间沟通形式的问题。现在是信息高度发达的互联网时代，对于组织者来说，不仅有像meetup这样专门发布聚会信息的网站，还有各种IM群来让大家讨论，献计献策。而在试用过这些五花八门的工具后，我个人倒是觉得，古老的邮件列表可能是最好的联系方式。原因有以下两点：

a）你现在所使用的IM工具也许在某一天会消失，但是电子邮件不会，它依然是现在世界范围内最广泛使用的沟通工具；
b）邮件列表可以很好地把信息和资料保存起来，方便将来查找。

选好沟通工具后，组织者就可以向大家征集与会议题并展开讨论了。同时组织者还要联系场地和赞助，比如说可以提供一些抽奖奖品或是纪念品等等。这些纪念品不用价值很昂贵，个人觉得一个马克杯，一件T恤，或是一个普通笔记本就可以了。此外，对提供场地和赞助的公司或个人，也可以在沙龙上进行一下适度的宣传或广告。毕竟人家提供了帮助，这样也可以提高赞助者们未来提供更多帮助的积极性。

接下来说一下时间的问题。举办一次沙龙还是比较耗费时间和精力的，所以2～3个月一次就可以了。举办时间最好安排在周六的下午，这样可以方便喜欢周末睡懒觉的朋友。选择周六另外一个原因是：第二天是周日，仍然休息，所以即使这一天弄得太晚，也不用担心明天早起上班。

（2）选题

一般的技术沙龙会安排3~4个小时，而这还包含参与者提问和中间休息的时间，所以我觉得安排4个选题就够了。

第一个选题是“暖场”作用，不一定非要技术相关。可以讲讲Go语言社区这段时间的动态，一些轶闻趣事等等，目的是把大家的兴趣和积极性调动起来。

第二、三个选题应该是和Go语言技术紧密相关的了，可以介绍Go语言的使用技巧和调试方法，当前有哪些比较好的Go语言开源项目，以及各个公司使用Go语言的经验分享等等。

最后一个选题可以是其它技术的话题分享：因为Go语言不是孤立的，而是和它运行的环境紧密结合的，所以可以讲讲Unix系统的知识，运维经验的分享等等。

（3）扫尾

开完沙龙后，大家要帮忙把会场打扫干净，这样将来赞助方才更愿意提供场地设施。组织者也要把一些资料及时地发出来，方便大家参考。

以上就是我个人对如何组织一场技术沙龙的一点拙见，希望能给大家一点帮助。

“Understanding Caching”笔记

本文是“Understanding Caching”的笔记：

（1）什么是cache line？

A cache line is the smallest unit of memory that can be transferred to or from a cache. The essential elements that quantify a cache are called the read and write line widths. These signify the minimum amount of data the cache must read or write from the memory or cache below it. Frequently, these quantities are the same, so caches often are quantified simply by the line width. Even if they differ, the longest width usually is called the line width.

（2）inclusive和exclusive：

A multilevel cache can be either inclusive or exclusive. Exclusive means a particular cache line may be present in exactly one of the cache levels and no more than one. Inclusive means the line may be present simultaneously in more than one level of cache. Nothing prevents the line widths from being different in differing cache levels.

（3）Write through和write back：

Write through means the cache may store a copy of the data, but the write must be completed at the next level down before it can be signaled as complete to the layer above. Write back means a write may be considered complete as soon as the data is stored in the cache. For a write back cache, as long as the written data is not transmitted, the cache line is considered dirty, because it ultimately must be written out to the level below.

（4）Coherency：

A cache line is termed coherent when the data in the line is identical to the data stored in the main memory being cached. If this is not true, the cache line is termed incoherent.

没有coherency带来的两个问题：
a）所有种类cache都有可能发生（stale data）：主存数据更新了，但是cache数据不是最新的。因此会导致读错误。如下图所示：

这是一个暂时的错误，因为正确的数据在主存中，只需让cache重新读一下即可。

b）这种错误只发生在write back cache中：主存和cache里分别更新了数据，现在要使二者数据一致。由于每次cache要更新一个cache line，所以必然要导致更新的数据出现不一致。如下图所示：

Libuv笔记（1）—— 模型

Libuv使用的是异步的（asynchronous），非阻塞（non-blocking），事件驱动的（event-driven）编程模式。它的核心是提供了一个事件循环（event loop），并对感兴趣的事件注册了回调函数（callback）。伪代码如下：

当有事件需要处理:
    取出下一个事件
    如果这个事件注册了回调函数:
        调用回调函数

Linux kernel IOMMU代码分析笔记（12）——page-table entry的相关代码定义

DMA请求的地址转换如下图所示：

page-table entry格式如下：

因为每个page-table entry占8（2^3）个byte，所以上面的转化图中只要9位就可以了（12 - 3，2^12 = 4KiB）。

GAW的定义：

Guest Address Width: Physical addressability limit within a partition (virtual machine)

可以理解为从虚拟机角度看到的物理地址宽度。举个例子，如果一个虚拟机只能访问2G内存，那么GAW就是31。

AGAW的定义：Adjusted Guest Address Width。为了保证9个bit长度的步长转化，GAW和AGAW之间的转换伪代码如下：

R = (GAW - 12) MOD 9;
if (R == 0) {
    AGAW = GAW;
} else {
    AGAW = GAW + 9 - R;
}
if (AGAW > 64)
    AGAW = 64;

对应的函数是guestwidth_to_adjustwidth：

static inline int guestwidth_to_adjustwidth(int gaw)
{
    int agaw;
    int r = (gaw - 12) % 9;

    if (r == 0)
        agaw = gaw;
    else
        agaw = gaw + 9 - r;
    if (agaw > 64)
        agaw = 64;
    return agaw;
}

AGAW的最小长度是30个bit，参考以下规范定义（context-entry格式里的内容）：

• 000b: 30-bit AGAW (2-level page table)
• 001b: 39-bit AGAW (3-level page table)
• 010b: 48-bit AGAW (4-level page table)
• 011b: 57-bit AGAW (5-level page table)
• 100b: 64-bit AGAW (6-level page table)

所以可以看到kernel里agaw的一些转换代码会用到30和2这些数字：

static inline int agaw_to_level(int agaw)
{
    return agaw + 2;
}

static inline int agaw_to_width(int agaw)
{
    return min_t(int, 30 + agaw * LEVEL_STRIDE, MAX_AGAW_WIDTH);
}

static inline int width_to_agaw(int width)
{
    return DIV_ROUND_UP(width - 30, LEVEL_STRIDE);
}

Git数据模型笔记

Git object数据是一个有向无环图，即从任何一个commit出发都可以遍历其任何的parent，但绝不会有环。每个commit都指向一个tree，而一个tree则指向了一个或多个tree和（或）blob。

Git数据模型如下图所示：

以下面目录结构为例：

工作目录下包含了两个目录和三个文件。初始化的git数据模型如下：

当修改lib/base/base_include.rb这个文件并提交以后，会产生一个新的blob以及相应的新tree。在当前commit打出tag以后，git数据模型如下：

修改根目录下init.rb文件提交后的git数据模型：

可以看到，每次commit都会产生一个新的tree。
最后的git数据模型如下图所示，包含了16个不可改变的object：

参考资料：
Git internals。

Lua笔记（22）—— error和assert

Lua中的error函数定义：

error (message [, level])

打印出message后，会终止程序运行。关于level的含义，参考下面例子理解会更清楚（test.lua）：

function f0()
        error("Error!")
end

function f1()
        f0()
end

function f2()
        f1()
end

f2()

执行如下：

lua: test.lua:2: Error!
stack traceback:
        [C]: in function 'error'
        test.lua:2: in function 'f0'
        test.lua:6: in function 'f1'
        test.lua:10: in function 'f2'
        test.lua:13: in main chunk
        [C]: in ?

默认情况下，level值为1。“lua: test.lua:2: Error!”把错误位置指向了脚本的第2行。把f0函数修改一下：

function f0()
        error("Error!")
end

再次执行，输出如下：

lua: test.lua:6: Error!
stack traceback:
        [C]: in function 'error'
        test.lua:2: in function 'f0'
        test.lua:6: in function 'f1'
        test.lua:10: in function 'f2'
        test.lua:13: in main chunk
        [C]: in ?

这次错误位置指向了脚本的第6行（“lua: test.lua:6: Error!”），也就是f1()函数，可以看到level指定了发生错误时，应该输出函数调用栈的哪一级函数。

assert函数定义如下：

assert (v [, message])

即v是假（nil或false）时，调用error函数，否则返回所有所有参数。其中message默认值是"assertion failed!"。举例如下：

function f0()
        assert(nil, "Assert!")
end
f0()

输出如下：

lua: test.lua:2: Assert!
stack traceback:
        [C]: in function 'assert'
        test.lua:2: in function 'f0'
        test.lua:13: in main chunk
        [C]: in ?

Lua笔记（21）—— “require module”的等价形式

在Learn Lua in 15 Minutes中提到，require module的等价形式：

-- Another file can use mod.lua's functionality:
local mod = require('mod')  -- Run the file mod.lua.

-- require is the standard way to include modules.
-- require acts like:     (if not cached; see below)
local mod = (function ()
  <contents of mod.lua>
end)()
-- It's like mod.lua is a function body, so that
-- locals inside mod.lua are invisible outside it.

这个解释很好地说明了require module的原理，让人豁然开朗。

分类：技术