Linux kernel 笔记 (60)——scheduling domain

NUMA系统上,由于不同CPU直接访问本地内存和远端内存的时间相差很大,所以更好地调度算法就显得很重要。Linux kernel引入了scheduling domain的概念。可以参看下面例子:

[root@localhost ~]# cd /proc/sys/kernel/sched_domain/
[root@localhost sched_domain]# ls
cpu0  cpu1  cpu2  cpu3  cpu4  cpu5  cpu6  cpu7
[root@localhost sched_domain]# ls -alt *
cpu0:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu1:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu2:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu3:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu4:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu5:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu6:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

cpu7:
total 0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain0
dr-xr-xr-x. 1 root root 0 Feb 26 20:06 domain1
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 .
dr-xr-xr-x. 1 root root 0 Feb 26 19:37 ..

/proc/sys/kernel/sched_domain/目录下每个CPU都有一个自己的目录,并且每个CPU目录下都有和自己相关的domain信息。

multi-level系统中,也拥有multi-levelscheduling domain(内核中结构体是struct sched_domain)。每个scheduling domain包含一组共享属性和调度策略的CPU;每个scheduling domain包含至少一个或多个CPU group(内核中结构体是struct sched_group),每个CPU group会被scheduling domain看做一个独立的单元。

scheduling domain的核心代码位于kernel\sched\core.c中,关于/proc/sys/kernel/sched_domain/cpu$/domain$中各个文件的含义,都可以在这里找到。

NUMA系统上,如果一个node利用率非常高,比如高于90%,而另一个node利用率可能只有60%~70%,这时可以尝试disable wakeup affinity

参考资料:
Scheduling domains
sched-domains.txt
Does domain0 in /proc/sys/kernel/sched_domain/cpu$ refer top-level domain in the system?
How to understan /proc/sys/kernel/sched_domain/cpu$/domain$/flags?

如何理解“load average”?

*nix系统中,执行topuptime命令可以显示当前系统的load average(分别是过去1515分钟的load平均值):

# uptime
 14:43:37 up 22 days,  1:47,  5 users,  load average: 0.00, 0.01, 0.05

load指的是正在使用和等待使用CPUprocess的数量和。因此,在单核系统上,load average这个值低于1.00表示系统还很空闲,1.00表示系统已经达到100%利用率了,高于1.00就需要引起注意了。

此外,100%利用率和系统处理器数目有关,单核系统的值是1.00,双核系统值就是2.00了,以此类推。因此在多处理器系统上,有可能load average的值很高,可是系统CPU实际上还很空闲。

P.S.:得到CPU数目的方法:

# grep 'model name' /proc/cpuinfo | wc -l
8

参考资料:
WHAT ABOUT MULTI-PROCESSORS? MY LOAD SAYS 3.00, BUT THINGS ARE RUNNING FINE!

What’s the difference between load average and CPU load?

Examining Load Average

load average video

 

Bash quoting简介

Bash quoting可以关闭Bash中具有特殊含义的meta字符的功能:
a)单引号:所有meta字符的功能均被关闭;
b)双引号:大部分meta字符的功能被关闭,除了$等少数字符;
c)反斜线(\):仅跟着\后面的meta字符被关闭。
这样就可以理解为什么解压多个zip文件时,要使用“unzip '*.zip'”而不是“unzip *.zip”。因为第二种会首先把*.zip替换成所有的文件名,而第一种方法不会这样做。

参考资料:
Shell十三问
How do I unzip multiple / many files under Linux?

 

shmmax和shmall

Linux kernel中针对shared memory有两个重要的配置项:shmmaxshmall

shmmax定义了一次分配shared memory的最大长度,单位是byte

# cat /proc/sys/kernel/shmmax
18446744073692774399

shmall定义了一共能分配shared memory的最大长度,单位是page

最大“shared memory” = shmall(cat /proc/sys/kernel/shmall) * pagesize(getconf PAGE_SIZE)

shmmax为例,介绍一下修改值的方法:

(1)现在系统shmmax的值:

# sysctl -a | grep shmmax
kernel.shmmax = 18446744073692774399

(2)修改shmmax的值:

# echo "536870912" > /proc/sys/kernel/shmmax
# sysctl -a | grep shmmax
kernel.shmmax = 536870912

可以看到值发生了变化。但是重启系统以后,shmmax又变回之前的值。如果要让值永久生效,可以使用下列方法:

# echo "kernel.shmmax = 536870912" >>  /etc/sysctl.conf
# sysctl -a | grep shmmax
kernel.shmmax = 18446744073692774399
# sysctl -p
kernel.shmmax = 536870912
# sysctl -a | grep shmmax
kernel.shmmax = 536870912

另外,关于如何设置shmallshmmax的值,也可以参考这个脚本

参考资料:
The Mysterious World of Shmmax and Shmall
Configuring SHMMAX and SHMALL for Oracle in Linux
What is shmmax, shmall, shmmni? Shared Memory Max

 

在RHEL系统上使用“subscription-manager”注册和激活“subscription”

RHEL系统中注册和使用subscription是两个过程:

NOTE: With Red Hat Subscription-Manager, registration and utilization of a subscription is actually a two-part process. First register a system, then apply a subscription.

可以使用下面命令一次完成两个过程:

# subscription-manager register --username <username> --password <password> --auto-attach

在我的RHEL 7.2系统上执行上述命令:

# subscription-manager register --username=xxxx --password=xxxx --auto-attach
Registering to: subscription.rhn.redhat.com:443/subscription
The system has been registered with ID: 333486bb-xxxxxx

Installed Product Current Status:
Product Name: Red Hat Enterprise Linux Server
Status:       Subscribed

然后检查状态:

# subscription-manager list

+-------------------------------------------+
    Installed Product Status
+-------------------------------------------+
Product Name:   Red Hat Enterprise Linux Server
Product ID:     69
Version:        7.2
Arch:           x86_64
Status:         Subscribed
Status Details:
Starts:         06/29/2015
Ends:           06/28/2016

接下来就可以使用“yum install”,“yum update”等命令安装和更新软件了,非常方便。

参考资料:
How to Register and Enable Red Hat Subscription, Repositories and Updates for RHEL 7.0 Server
How to register and subscribe a system to the Red Hat Customer Portal using Red Hat Subscription-Manager
RHEL : Register Subscription

 

使用LXC初体验

我使用的OSCentOS 7.1,需要安装lxclxc-templates。安装后的模板在/usr/share/lxc/templates目录下:

# ls
lxc-alpine    lxc-archlinux  lxc-centos  lxc-debian    lxc-fedora  lxc-openmandriva  lxc-oracle  lxc-sshd    lxc-ubuntu-cloud
lxc-altlinux  lxc-busybox    lxc-cirros  lxc-download  lxc-gentoo  lxc-opensuse      lxc-plamo   lxc-ubuntu

接下来以CentOS为模板创建一个container

lxc-create -t centos --name cn-centos

临时的root密码存在/var/lib/lxc/cn-01/tmp_root_pass

# cat /var/lib/lxc/cn-centos/tmp_root_pass
Root-cn-centos-EXb6bB

启动container

# lxc-start -n cn-centos

停止container

# lxc-stop -n cn-centos

参考资料:
Setup Linux Containers Using LXC On Ubuntu 15.04

 

LXC,cgroups和namespace简介

LXC is a userspace interface for the Linux kernel containment features. Through a powerful API and simple tools, it lets Linux users easily create and manage system or application containers.

The linux containers, lxc, aims to use these new functionalities to provide a userspace container object which provides full resource isolation and resource control for an application or a system.

Linux container技术的目标是为应用程序或系统提供完整的资源隔离和控制。LXC项目通过提供一组API接口和工具,可以让其他程序方便地使用Linux container技术。

The container technology is actively being pushed into the mainstream linux kernel. It provides the resource management through the control groups aka process containers and resource isolation through the namespaces.

Linux container技术cgroups(control groups)namespaces实现。两者的功能如下:

cgroups = limits how much you can use;
namespaces = limits what you can see (and therefore use)

Cgroups限制了你能够拥有的资源,而namespces限制了你能够看到的资源。

参考资料:
LXC
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic

 

Ubuntu使用初体验

今天用了一下Ubuntu,第一感觉就是在root账户的使用方面同RHELSuSE不大一样:

(1)安装Ubuntu过程没有设置root密码的步骤。需要你使用创建的账户登录后,使用sudo passwd命令设置root密码。

(2)似乎不能使用root账户进行ssh远程连接(更新:解决办法在这里):

$ ssh root@10.10.249.177

要使用你创建的账户:

$ ssh nan@110.10.249.177

 

find命令的“-exec COMMAND \;”

下面这个find命令列出当前目录下的*.stp文件:

# find . -name '*.stp' -exec ls {} \;
./Documents/one.stp
./Documents/two.stp

关于find命令的“-exec COMMAND \;”:

find

-exec COMMAND \;

Carries out COMMAND on each file that find matches. The command sequence terminates with ; (the “;” is escaped to make certain the shell passes it to find literally, without interpreting it as a special character).

If COMMAND contains {}, then find substitutes the full path name of the selected file for “{}”.

;的作用是标示命令完结,\;是让shell;原封不动地传给find命令。而{}会使用查找出来的文件的全路径名。

参考资料:
16.2. Complex Commands

 

Linux kernel 笔记 (59)——Kconfig中的“depends on”和“select”

Kconfig文件中:

config A
    depends on B
    select C

它的含义是:CONFIG_A配置与否,取决于CONFIG_B是否配置。一旦CONFIG_A配置了,CONFIG_C也自动配置了。

参考资料:
“select” vs “depends” in kernel Kconfig