Unix中的zombie进程和orphan进程

Unix中子进程退出后,如果父进程没有使用wait()函数获得子进程的退出状态,则子进程的相关信息仍然会在系统的进程表里占用一席之地,这时的子进程称之为zombie进程。如果父进程先于子进程退出,这时的子进程称之为orphan进程,而init进程则会变成orphan进程的父进程。init进程会定期处理父进程是initzombie进程。

参考资料:
Zombie process
Zombie process vs Orphan process

FreeBSD中的sysctl函数

FreeBSDsysctl家族的函数定义:

#include <sys/types.h>
 #include <sys/sysctl.h>

 int
 sysctl(const int *name, u_int namelen, void *oldp, size_t *oldlenp,
 const void *newp, size_t newlen);

 int
 sysctlbyname(const char *name, void *oldp, size_t *oldlenp,
 const void *newp, size_t newlen);

 int
 sysctlnametomib(const char *name, int *mibp, size_t *sizep);

sysctl函数参数中,namenamelen用来表明内核参数IDoldpoldlenp用来存储当前内核参数的值;而newpnewlen则用来设置新的内核参数值。如果不需要的话,可以把相应的值置成NULL
看一下sysctlbyname的实现:

int
sysctlbyname(const char *name, void *oldp, size_t *oldlenp,
    const void *newp, size_t newlen)
{
    int real_oid[CTL_MAXNAME+2];
    size_t oidlen;

    oidlen = sizeof(real_oid) / sizeof(int);
    if (sysctlnametomib(name, real_oid, &oidlen) < 0)
        return (-1);
    return (sysctl(real_oid, oidlen, oldp, oldlenp, newp, newlen));
}

可以看到,sysctlbyname首先通过sysctlnametomib获得真正的ID,接着调用sysctl完成想要的工作。

参考资料:
SYSCTL(3)
Grokking SYSCTL and the Art of Smashing Kernel Variables

uptime命令简介

uptime命令用来显示系统已经运行的时间:

# uptime
 19:05:33 up  3:16,  2 users,  load average: 0.00, 0.01, 0.05

19:05:33是当前系统时间,up 3:16是系统已经运行了3小时16分。后面还有用户和系统load信息。如果只关心系统运行了多次时间,可以使用下列命令:

# uptime -p
up 3 hours, 16 minutes

uptime命令得到系统运行时间是通过读取/proc/uptime文件:

# cat /proc/uptime
11984.78 95454.77

第一个字段是系统启动的秒数,第二个字段是系统每个CPU core处在idle状态的时间总和。

 

devfs,tmpfs和devtmpfs

以下摘自Specfs, Devfs, Tmpfs, and Others

specfs – specfs, or Special FileSystem, is a virtual filesystem used to access special device files. This filesystem is odd compared to other filesystems in general because this filesystem does not require a mount-point, yet the OS can still use specfs. However, specfs can be mounted by the user (mount -t specfs none /dev/streams). The device files for character devices in the /dev/ directory use specfs.

devfs – devfs is a device manager in the form of a filesystem. The Device FileSystem is largely the same as specfs except for some differences in the way they function and their uses. devfs is used for most of the device files in /dev/. Most Unix and Unix-like systems use devfs including Mac OS X, *BSD, and Solaris. Nearly all Unix and Unix-like systems that use devfs place it on the kernelspace. However, Linux uses a userspace-kernelspace hybrid approach. This means the devfs virtual filesystem is on the kernelspace and userspace.

tmpfs – The Temporary filesystem is a virtual filesystem for storing temporary files. This filesystem is really in the memory and/or in the swap space. Obviously, all data on this filesystem are lost when the system is shutdown. The mount point is /tmp/.

devtmpfs – This is an improved devfs. The purpose of devtmpfs is to boost boot-time. devtmpfs is more like tmpfs than devfs. The mount-point is /dev/. devtmpfs only creates device files for currently available hardware on the local system.

总结一下:
devfs是文件系统形式的device managertmpfs存在在内存和swap中,因此只能保存临时文件。devtmpfs是改进的devfs,也是存在内存中,挂载点是/dev/

 

使用vmstat命令监控CPU使用

vmstat命令可以用来监控CPU的使用状况。举例如下:

# vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 5201924   1328 5578060    0    0     0     0 1582 6952  2  1 98  0  0
 1  0      0 5200984   1328 5577996    0    0     0     0 2020 20567  9  1 90  0  0
 0  0      0 5198668   1328 5577952    0    0     0     0 1568 7617  5  1 94  0  0
 0  0      0 5194844   1328 5578000    0    0     0   187 1249 7057  1  1 98  0  0
 0  0      0 5199956   1328 5578232    0    0     0     0 1496 7306  4  1 95  0  0

上述命令每隔1秒输出系统状态,最后5列是描述的是CPU状况。man手册上关于这5列的含义描述的很清楚:

CPU
       These are percentages of total CPU time.
       us: Time spent running non-kernel code.  (user time, including nice time)
       sy: Time spent running kernel code.  (system time)
       id: Time spent idle.  Prior to Linux 2.5.41, this includes IO-wait time.
       wa: Time spent waiting for IO.  Prior to Linux 2.5.41, included in idle.
       st: Time stolen from a virtual machine.  Prior to Linux 2.6.11, unknown.

vmstat实质上是从/proc/stat文件获得系统状态:

# cat /proc/stat
cpu  381584 711 299364 1398303520 429839 0 251 0 0 0
cpu0 90740 58 44641 174627550 131209 0 120 0 0 0
cpu1 43141 26 22925 174746812 108219 0 10 0 0 0
cpu2 41308 35 25097 174831161 25877 0 40 0 0 0
cpu3 39301 70 27514 174836084 27792 0 4 0 0 0
cpu4 39187 78 46191 174750027 109013 0 0 0 0 0
......

需要注意的是这里数字的单位是Jiffies

另外,vmstat计算CPU时间百分比使用的是“四舍五入”算法(vmstat.c):

static void new_format(void){
    ......
    duse = *cpu_use + *cpu_nic;
    dsys = *cpu_sys + *cpu_xxx + *cpu_yyy;
    didl = *cpu_idl;
    diow = *cpu_iow;
    dstl = *cpu_zzz;
    Div = duse + dsys + didl + diow + dstl;
    if (!Div) Div = 1, didl = 1;
    divo2 = Div / 2UL;
    printf(w_option ? wide_format : format,
           running, blocked,
           unitConvert(kb_swap_used), unitConvert(kb_main_free),
           unitConvert(a_option?kb_inactive:kb_main_buffers),
           unitConvert(a_option?kb_active:kb_main_cached),
           (unsigned)( (unitConvert(*pswpin  * kb_per_page) * hz + divo2) / Div ),
           (unsigned)( (unitConvert(*pswpout * kb_per_page) * hz + divo2) / Div ),
           (unsigned)( (*pgpgin        * hz + divo2) / Div ),
           (unsigned)( (*pgpgout           * hz + divo2) / Div ),
           (unsigned)( (*intr          * hz + divo2) / Div ),
           (unsigned)( (*ctxt          * hz + divo2) / Div ),
           (unsigned)( (100*duse            + divo2) / Div ),
           (unsigned)( (100*dsys            + divo2) / Div ),
           (unsigned)( (100*didl            + divo2) / Div ),
           (unsigned)( (100*diow            + divo2) / Div ),
           (unsigned)( (100*dstl            + divo2) / Div )
    );
    ......
}

所以会出现CPU利用百分比相加大于100的情况:2 + 1 + 98 = 101

另外,在Linux系统上,r字段表示的是当前正在运行和等待运行的task的总和。

 

参考资料:
/proc/stat explained
procps

 

Bash中的测试表达式

Bash shell中,每个执行命令都有一个返回值表示其退出状态:0表示true1表示falsetest命令是专门测试执行命令返回值,其格式如下:

test expression
或:
[ expression ]

目前test只支持3种测试对象:字符串,整数(0和正整数,不包含负数和小数点)和文件。当expression测试为“真”时,test命令就返回0true),反之返回非0false)。 关于test表达式的例子和解释,可以参考How to understand if condition in bash?

参考资料:
Shell十三问

 

GNU Parallel简介

GNU parallel可以并行地执行shell命令。看一个简单的例子:

# ls
a  b  c
# cat a
aaaa

# cat b
bbbb

# cat c
cccc

# parallel cat ::: *
aaaa

bbbb

cccc

上面的例子输出了abc3个文件的内容。:::告诉GNU parallel从命令行而不是stdin读取参数,而shell会把*扩展成当前目录下的文件名。

再看一个例子:

# parallel sleep {}\; echo {} ::: 2 1 4 3
1
2
3
4
# parallel -k sleep {}\; echo {} ::: 2 1 4 3
2
1
4
3

正常情况下,完成一个jobparallel就会把这个job的内容输出。-k选项保证输出顺序和输入顺序一致。而{}会替换成input line

GNU parallel不会把多个输出的内容混杂在一起,对比下列两个命令输出:

# traceroute foss.org.my & traceroute debian.org & traceroute freenetproject.org & wait
[1] 4920
[2] 4921
[3] 4922
traceroute to debian.org (149.20.20.20), 30 hops max, 60 byte packets
traceroute to freenetproject.org (80.68.94.117), 30 hops max, 60 byte packets
foss.org.my: Name or service not known
Cannot handle "host" cmdline arg `foss.org.my' on position 1 (argc 1)
[1]   Exit 2                  traceroute foss.org.my
 1  16.187.248.2 (16.187.248.2)  1.705 ms  1.709 ms  2.067 ms
 1  16.187.248.2 (16.187.248.2)  1.315 ms  1.314 ms  1.618 ms
......

# parallel traceroute ::: foss.org.my debian.org freenetproject.org
foss.org.my: Name or service not known
Cannot handle "host" cmdline arg `foss.org.my' on position 1 (argc 1)
traceroute to debian.org (140.211.15.34), 30 hops max, 60 byte packets
 1  16.187.248.2 (16.187.248.2)  1.871 ms  1.857 ms  2.151 ms
......
traceroute to freenetproject.org (80.68.94.117), 30 hops max, 60 byte packets
 1  16.187.248.2 (16.187.248.2)  2.175 ms  2.471 ms  2.471 ms
 2  16.160.221.81 (16.160.221.81)  0.456 ms  0.463 ms  0.463 ms
......

参考资料:
GNU Parallel: The Command-Line Power Tool
GNU Parallel manual

 

getopt和getopt_long

这篇笔记选自Using getopt。 一个典型的的Unix程序格式如下:

getopt [-dmp] [-s name] -f name file [file ...]

a)dmp是可选option,在一个[]中表示它们可以一起使用;

b)[-s name]表示s是一个带参数的可选option
c)-f name表示f是一个带参数的必选option
d)file [file ...]表示程序还需要一个或多个命令行参数。
getopt函数原型如下:

#include <unistd.h>

int getopt(int argc, char * const argv[], const char *optstring);

extern char *optarg;
extern int optind, opterr, optopt;

需要注意以下几点:

a)每次调用getopt后,如果option带参数,optarg指向后面跟着的参数;optind则表示下一次处理optionindex。因此当getopt解析完所有option后,如果同argc相同,则表示没有命令行参数。
b)getopt前两个参数直接从main函数参数得到,第三个参数指定如何处理option"df:mps:"。冒号表示前面的option后面需要带参数。如果getopt解析option时遇到不在optstring中的option返回?,把option全部解析完返回-1
下面看一下getopt_longgetopt_long_only(参考getopt(3) – Linux man page):

#include <getopt.h>

int getopt_long(int argc, char * const argv[],
           const char *optstring,
           const struct option *longopts, int *longindex);
int getopt_long_only(int argc, char * const argv[],
        const char *optstring,
        const struct option *longopts, int *longindex);

getopt_long除了可以处理short option外,还可以处理long option(以--开头)。关于struct option定义如下:

struct option {
    const char *name;
    int         has_arg;
    int        *flag;
    int         val;
};
The meanings of the different fields are:
name
is the name of the long option.

has_arg
is: no_argument (or 0) if the option does not take an argument; required_argument (or 1) if the option requires an argument; or optional_argument (or 2) if the option takes an optional argument.  

flag
specifies how results are returned for a long option. If flag is NULL, then getopt_long() returns val. (For example, the calling program may set val to the equivalent short option character.) Otherwise, getopt_long() returns 0, and flag points to a variable which is set to val if the option is found, but left unchanged if the option is not found.

val
is the value to return, or to load into the variable pointed to by flag.

如果flagNULLgetopt_long会返回val的值,因此通常会把flag置成NULL,把val置成与long option对应的short option。否则getopt_long会返回0,并把val的值赋给flag

参考下列代码(选自GNU binutils中的size命令)可以更好地了解getopt_long

#define OPTION_FORMAT (200)
#define OPTION_RADIX (OPTION_FORMAT + 1)
#define OPTION_TARGET (OPTION_RADIX + 1)

static struct option long_options[] =
{
  {"common", no_argument, &show_common, 1},
  {"format", required_argument, 0, OPTION_FORMAT},
  {"radix", required_argument, 0, OPTION_RADIX},
  {"target", required_argument, 0, OPTION_TARGET},
  {"totals", no_argument, &show_totals, 1},
  {"version", no_argument, &show_version, 1},
  {"help", no_argument, &show_help, 1},
  {0, no_argument, 0, 0}
};


 while ((c = getopt_long (argc, argv, "ABHhVvdfotx", long_options,
               (int *) 0)) != EOF)
    switch (c)
      {
      case OPTION_FORMAT:
    switch (*optarg)
      {
      case 'B':
      case 'b':
        berkeley_format = 1;
        break;
      case 'S':
      case 's':
        berkeley_format = 0;
        break;
      default:
        non_fatal (_("invalid argument to --format: %s"), optarg);
        usage (stderr, 1);
      }
    break;

    ......

    case 0:
    break;
    ......
    }

{"format", required_argument, 0, OPTION_FORMAT}flagNULL,所以getopt_long返回值是OPTION_FORMAT;根据optarg确定应该使用哪种format。而{"totals", no_argument, &show_totals, 1}flagNULLgetopt_long返回值是0show_totals的值为1

getopt_longgetopt_long_only的区别:

getoptlongonly() is like getopt_long(), but ‘-‘ as well as “–” can indicate a long option. If an option that starts with ‘-‘ (not “–“) doesn’t match a long option, but does match a short option, it is parsed as a short option instead.

 

/etc/hosts文件

/etc/hosts文件保存IP地址和hostname之间的映射。格式是:IP地址 canocial_hostname name。举例如下:

127.0.0.1       localhost
192.168.1.10    foo.mydomain.org       foo
192.168.1.13    bar.mydomain.org       bar
146.82.138.7    master.debian.org      master
209.237.226.90  www.opensource.org

尽管现在host table已经被DNS取代了,但是有时在bootstrappingNIS和隔离节点的环境下仍然会用到/etc/hosts
修改/etc/hosts文件可以马上生效,但有时需要清空程序的缓存。

参考资料:
hosts