Linux kernel IOMMU代码分析笔记(11)——root_entry的相关代码定义

root_entry3.10版本的相关定义:

/*
 * 0: Present
 * 1-11: Reserved
 * 12-63: Context Ptr (12 - (haw-1))
 * 64-127: Reserved
 */
struct root_entry {
    u64 val;
    u64 rsvd1;
};
#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))
static inline bool root_present(struct root_entry *root)
{
    return (root->val & 1);
}
static inline void set_root_present(struct root_entry *root)
{
    root->val |= 1;
}
static inline void set_root_value(struct root_entry *root, unsigned long value)
{
    root->val |= value & VTD_PAGE_MASK;
}

root_entrymainstream版本的相关定义:

/*
 * 0: Present
 * 1-11: Reserved
 * 12-63: Context Ptr (12 - (haw-1))
 * 64-127: Reserved
 */
struct root_entry {
    u64 lo;
    u64 hi;
};
#define ROOT_ENTRY_NR (VTD_PAGE_SIZE/sizeof(struct root_entry))

/*
 * Take a root_entry and return the Lower Context Table Pointer (LCTP)
 * if marked present.
 */
static phys_addr_t root_entry_lctp(struct root_entry *re)
{
    if (!(re->lo & 1))
        return 0;

    return re->lo & VTD_PAGE_MASK;
}

/*
 * Take a root_entry and return the Upper Context Table Pointer (UCTP)
 * if marked present.
 */
static phys_addr_t root_entry_uctp(struct root_entry *re)
{
    if (!(re->hi & 1))
        return 0;

    return re->hi & VTD_PAGE_MASK;
}

VTD_PAGE_MASK的相关定义:

/*
 * VT-d hardware uses 4KiB page size regardless of host page size.
 */
#define VTD_PAGE_SHIFT      (12)
#define VTD_PAGE_SIZE       (1UL << VTD_PAGE_SHIFT)
#define VTD_PAGE_MASK       (((u64)-1) << VTD_PAGE_SHIFT)
#define VTD_PAGE_ALIGN(addr)    (((addr) + VTD_PAGE_SIZE - 1) & VTD_PAGE_MASK)

所以root_entry_lctp得到的是Context Table的物理地址。

Root entry的格式如下:

1

Extended root entry的格式如下:

2

Root entryextended root entry都占16byte16 * 8 = 128),而HAW代表这个平台的Host Address Width,一共有256root entryextended root entry4096/16 = 256)。

参考资料:
Intel ® Virtualization Technology for Directed I/O

Linux kernel IOMMU代码分析笔记(10)——[PATCH] iommu/vt-d: Load old data structures only in kdump kernel

kernel mainstreamintel-iommu.c代码中:

static int __init init_dmars(void)
{
    ......
    if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
        iommu_disable_translation(iommu);
        clear_translation_pre_enabled(iommu);
        pr_warn("Translation was enabled for %s but we are not in kdump mode\n",
            iommu->name);
    }
    ......
}

translation_pre_enabled函数如下:

static bool translation_pre_enabled(struct intel_iommu *iommu)
{
    return (iommu->flags & VTD_FLAG_TRANS_PRE_ENABLED);
}

VTD_FLAG_TRANS_PRE_ENABLED赋值是在init_translation_status函数中:

static void init_translation_status(struct intel_iommu *iommu)
{
    u32 gsts;

    gsts = readl(iommu->reg + DMAR_GSTS_REG);
    if (gsts & DMA_GSTS_TES)
        iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
}

DMAR_GSTS_REG(Global status register)DMA_GSTS_TES(Translation Enable Status)表明是否开启了DMA Remapping功能。

所以if (translation_pre_enabled(iommu) && !is_kdump_kernel())这段代码含义是如果这个iommu硬件单元已经开启了DMA Remapping功能,但是当前运行的kernel不是rebootkernel,则当前iommu硬件状态是不能被认为是正确的,所以要把DMAR_GSTS_REG(Global status register)寄存器,和iommu->flags都要重置(clear_translation_pre_enabled)。

同理,在intel_irq_remapping.c中,也有类似代码:

static int intel_setup_irq_remapping(struct intel_iommu *iommu)
{
    ......
    if (ir_pre_enabled(iommu)) {
        if (iommu_load_old_irte(iommu))
            pr_err("Failed to copy IR table for %s from previous kernel\n",
                   iommu->name);
        else
            pr_info("Copied IR table for %s from previous kernel\n",
                iommu->name);
    }
    ......
}

static int iommu_load_old_irte(struct intel_iommu *iommu)
{
    ......
    if (!is_kdump_kernel()) {
        ......
    }
    ......
 }

参考资料:
[PATCH 04/17] iommu/vt-d: Load old data structures only in kdump kernel
Intel ® Virtualization Technology for Directed I/O

Linux kernel 笔记 (14)——is_kdump_kernel函数

#ifdef CONFIG_CRASH_DUMP
/*
 * is_kdump_kernel() checks whether this kernel is booting after a panic of
 * previous kernel or not. This is determined by checking if previous kernel
 * has passed the elf core header address on command line.
 *
 * This is not just a test if CONFIG_CRASH_DUMP is enabled or not. It will
 * return 1 if CONFIG_CRASH_DUMP=y and if kernel is booting after a panic of
 * previous kernel.
 */

static inline int is_kdump_kernel(void)
{
    return (elfcorehdr_addr != ELFCORE_ADDR_MAX) ? 1 : 0;
}
#else /* !CONFIG_CRASH_DUMP */
static inline int is_kdump_kernel(void) { return 0; }
#endif /* CONFIG_CRASH_DUMP */

is_kdump_kernel用来检查当前运行的kernel是不是由于之前运行的kernel panic了,而重启的kernel。如果没有配置CONFIG_CRASH_DUMP,则总是返回0

Go语言实践技巧(7)——value receiver和pointer receiver

Value receiver:

func (u user) fun1() {
    ....
}

Pointer receiver:

func (u *user) fun2() {
    ....
}

Value receiver操作的是值的拷贝,而pointer receiver操作的是实际的值。

pointer去调用value receiver的方法,实际的操作是:

(*p).fun1()

而用value去调用pointer receiver的方法,实际的操作是:

(&v).fun2()

参考资料:
Go in Action

Go语言实践技巧(6)——map key的选择

The map key can be a value from any built-in or struct type as long as the value can be used in an expression with the == operator. Slices, functions, and struct types that contain slices can’t be used as a map key.

 

map key可以使用任何内置类型或结构类型的值,只要这个值可以使用在==表达式中。slice,函数,和包含slice的结构体不能用作key

参考资料:
Go in Action

Shark代码分析笔记(3)——shark_init.lua

看一下shark_init.lua这个文件(省去版权信息):

local uv = require("uv")
local ffi = require("ffi")

package.path = package.path .. ";./deps/?.lua"
package.cpath = package.cpath .. ";./deps/?.so"

-- microsecond precision
ffi.cdef[[
typedef long time_t;

typedef struct timeval {
    time_t tv_sec;
    time_t tv_usec;
} timeval;

int gettimeofday(struct timeval *t, void *tzp);
]]

local gettimeofday_struct = ffi.new("timeval")

shark.gettimeofday = function()
  ffi.C.gettimeofday(gettimeofday_struct, nil)
  return tonumber(gettimeofday_struct.tv_sec) * 1000000 +
         tonumber(gettimeofday_struct.tv_usec)
end

set_interval = function(callback, interval)
  local timer = uv.new_timer()
  local function ontimeout()
    callback(timer)
  end
  uv.timer_start(timer, interval, interval, ontimeout)
  return timer
end

set_timeout = function(callback, timeout)
  local timer = uv.new_timer()
  local function ontimeout()
    uv.timer_stop(timer)
    uv.close(timer)
    callback(timer)
  end
  uv.timer_start(timer, timeout, 0, ontimeout)
  return timer
end

local shark_end_notify_list = {}

shark.add_end_notify = function(callback)
  table.insert(shark_end_notify_list, callback)
end

shark.on_end = function(callback)
  local function call_end()
    --notify registered on_end function
    for _, cb in pairs(shark_end_notify_list) do
      cb()
    end

    callback()
    os.exit(0)
  end
  local sigint = uv.new_signal()
  uv.signal_start(sigint, "sigint", function()
    call_end()
  end)

  local sigterm = uv.new_signal()
  uv.signal_start(sigterm, "sigterm", function()
    call_end()
  end)
end


---------------------------------------------------------------

local function fill_line(n, max)
  for i = 1, max do
    if i < n then
      io.write("*")
    else
      io.write(" ")
    end
  end
end

-- standard histogram print function
-- all type keys and number value
local __print_hist = function(t, cmp_func, mode)
  local stdSum = 0
  local array = {}

  for k, v in pairs(t) do
    stdSum = stdSum + v
    if tostring(k) ~= "" then
      array[#array + 1] = {k = k, v = v}
    end
  end

  table.sort(array, function(v1, v2)
    if cmp_func ~= nil then
      return cmp_func(v1.v, v2.v)
    else
      if v1.v > v2.v then return true end
    end
  end)

  if mode == "default" then
    io.write("                          value  ---------- Distribution ----------  count\n")
  end

  for k, v in pairs(array) do
    if mode == "default" then
      io.write(string.format("%33s |", tostring(v.k)))
      fill_line(v.v * 34 / stdSum, 34)
      io.write(string.format("| %d\n", v.v))
    else
      io.write(string.format("%s\n%d\n", tostring(v.k), v.v))
    end
  end
end

function print_hist(t, cmp_func)
  __print_hist(t, cmp_func, "default")
end


function print_hist_raw(t, cmp_func)
  __print_hist(t, cmp_func, "raw")
end

shark.print_hist = print_hist
shark.print_hist_raw = print_hist_raw

(1)

local uv = require("uv")
local ffi = require("ffi")

package.path = package.path .. ";./deps/?.lua"
package.cpath = package.cpath .. ";./deps/?.so"

require("uv")加载luv模块(在main函数中已经将luaopen_luv注册进了package.preload table),require("ffi")加载luajit中的ffi模块。
然后修改package.pathpackage.cpath,这样可以找到shark依赖的lua文件和库。

(2)

-- microsecond precision
ffi.cdef[[
typedef long time_t;

typedef struct timeval {
    time_t tv_sec;
    time_t tv_usec;
} timeval;

int gettimeofday(struct timeval *t, void *tzp);
]]

local gettimeofday_struct = ffi.new("timeval")

shark.gettimeofday = function()
  ffi.C.gettimeofday(gettimeofday_struct, nil)
  return tonumber(gettimeofday_struct.tv_sec) * 1000000 +
         tonumber(gettimeofday_struct.tv_usec)
end

ffi.new()返回一个cdata类型的值。shark.gettimeofday返回的是当前时间的微秒(us)值。

(3)

set_interval = function(callback, interval)
  local timer = uv.new_timer()
  local function ontimeout()
    callback(timer)
  end
  uv.timer_start(timer, interval, interval, ontimeout)
  return timer
end

set_timeout = function(callback, timeout)
  local timer = uv.new_timer()
  local function ontimeout()
    uv.timer_stop(timer)
    uv.close(timer)
    callback(timer)
  end
  uv.timer_start(timer, timeout, 0, ontimeout)
  return timer
end

set_intervalset_timeout利用了luv模块中定时器相关函数。set_interval函数会让callback函数间隔interval时间执行。而set_timeout函数会让callback函数在timeout后执行,且执行一次。要注意这两个函数的时间单位是毫秒(ms)。

(4)

local shark_end_notify_list = {}

shark.add_end_notify = function(callback)
  table.insert(shark_end_notify_list, callback)
end

创建一个shark_end_notify_listtable,并且定义一个shark.add_end_notify的函数。这个函数的作用是向shark_end_notify_list添加回调函数(callback),这个table中的函数会在shark.on_end这个函数中调用,也就是在脚本退出时执行收尾工作。

(5)

shark.on_end = function(callback)
  local function call_end()
    --notify registered on_end function
    for _, cb in pairs(shark_end_notify_list) do
      cb()
    end

    callback()
    os.exit(0)
  end
  local sigint = uv.new_signal()
  uv.signal_start(sigint, "sigint", function()
    call_end()
  end)

  local sigterm = uv.new_signal()
  uv.signal_start(sigterm, "sigterm", function()
    call_end()
  end)
end

shark.on_end函数的参数是一个回调函数(callback)。在shark.on_end里定义了一个local函数:call_endcall_end首先会遍历shark_end_notify_list,并把其中的函数都执行一遍,最后调用shark.on_end函数传入的回调函数(callback)。 call_end函数就是shark.on_end函数为sigintsigterm信号注册的信号处理函数。

(6)

local function fill_line(n, max)
  for i = 1, max do
    if i < n then
      io.write("*")
    else
      io.write(" ")
    end
  end
end

fill_line函数用来输出*,用在接下来的__print_hist函数中。

(7)

-- standard histogram print function
-- all type keys and number value
local __print_hist = function(t, cmp_func, mode)
  local stdSum = 0
  local array = {}

  for k, v in pairs(t) do
    stdSum = stdSum + v
    if tostring(k) ~= "" then
      array[#array + 1] = {k = k, v = v}
    end
  end

  table.sort(array, function(v1, v2)
    if cmp_func ~= nil then
      return cmp_func(v1.v, v2.v)
    else
      if v1.v > v2.v then return true end
    end
  end)

  if mode == "default" then
    io.write("                          value  ---------- Distribution ----------  count\n")
  end

  for k, v in pairs(array) do
    if mode == "default" then
      io.write(string.format("%33s |", tostring(v.k)))
      fill_line(v.v * 34 / stdSum, 34)
      io.write(string.format("| %d\n", v.v))
    else
      io.write(string.format("%s\n%d\n", tostring(v.k), v.v))
    end
  end
end

__print_hist函数用来打印柱状图。

  local stdSum = 0
  local array = {}

  for k, v in pairs(t) do
    stdSum = stdSum + v
    if tostring(k) ~= "" then
      array[#array + 1] = {k = k, v = v}
    end
  end

输入参数t是一个table。以上代码是遍历这个table,并且生成一个新的“array table”。这个array的值又是一个table:包含输入tablekeyvalue

table.sort(array, function(v1, v2)
    if cmp_func ~= nil then
      return cmp_func(v1.v, v2.v)
    else
      if v1.v > v2.v then return true end
    end
  end)

以上代码是对array进行排序。如果没有输入cmp_func函数参数,就用默认的比较方式。

if mode == "default" then
    io.write("                          value  ---------- Distribution ----------  count\n")
  end

  for k, v in pairs(array) do
    if mode == "default" then
      io.write(string.format("%33s |", tostring(v.k)))
      fill_line(v.v * 34 / stdSum, 34)
      io.write(string.format("| %d\n", v.v))
    else
      io.write(string.format("%s\n%d\n", tostring(v.k), v.v))
    end
  end

以上代码就是打印最后的柱状图了。输出结果类似:

                           value  ---------- Distribution ----------  count
  syscalls:sys_enter_gettimeofday |********                          | 25940
   syscalls:sys_exit_gettimeofday |********                          | 25940
    syscalls:sys_enter_epoll_wait |****                              | 12977
     syscalls:sys_exit_epoll_wait |****                              | 12977
          syscalls:sys_exit_alarm |*                                 | 3917

 

(8)

function print_hist(t, cmp_func)
  __print_hist(t, cmp_func, "default")
end


function print_hist_raw(t, cmp_func)
  __print_hist(t, cmp_func, "raw")
end

shark.print_hist = print_hist
shark.print_hist_raw = print_hist_raw

最后就是把__print_hist封装出shark.print_histshark.print_hist_raw两个函数供其它程序调用。

 

 

 

 

git remote 命令简介

git remote会列出所有的remote repository名字,详细信息可以通过git remote -v得到。

git remote add <remote-name> <remote-path>可以添加一个remote repository,并且指定名字为<remote-name>。举例如下:

git remote add mary ../marys-repo

git fetch <remote-name>则是下载remote repository,但是不会merge

git merge <remote-name>/<branch-name>则是把<remote-name>/<branch-name>所指定的branch merge到当前的branch。举例如下:

git merge mary/master

git push <remote-name> <branch-name>是把本地一个branch上传到<remote-name>所指定的remote repository。举例如下:

git push mary dummy

git push <remote-name> <tag-name>则是上传指定的tag

git branch -r列出所有remote branch

Linux kernel IOMMU代码分析笔记(9)——EIM,IR和QI

支持IOMMU的硬件单元的Extended Capability Register有三个关联的位:

EIMExtended Interrupt Mode):在X86_64平台,0表示支持xAPIC1表示支持x2APICItanium平台这一位没意义。并且这一位只有在IR位设置为1才有效。

IRInterrupt Remapping support):1表示支持Interrupt remapping0表示不支持。硬件单元支持Interrupt remapping,也必须支持QI

QIQueued Invalidation support):1支持Queued Invalidation0表示不支持。

参考资料:
Intel ® Virtualization Technology for Directed I/O