技术 | 我的站点

docker笔记（4）—— 如何进入一个正在运行的“docker container”？

“docker attach [container-id]”命令有时会hang住，执行Ctrl+C命令也不起作用：

[root@localhost ~]# docker attach a972e69ab444
^C^C^C^C^C^C^C^C^C^C

使用pstack命令查看函数调用栈：

[root@localhost ~]# pstack 29744
Thread 5 (Thread 0x7f9079bd8700 (LWP 29745)):
#0  runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2  0x0000000001161c58 in runtime.sched ()
#3  0x0000000000000000 in ?? ()
Thread 4 (Thread 0x7f90792d7700 (LWP 29746)):
#0  runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000417782 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:55
#2  0x00007f907b830f60 in ?? ()
#3  0x0000000000000000 in ?? ()
Thread 3 (Thread 0x7f9078ad6700 (LWP 29747)):
#0  runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2  0x00000000011618a0 in text/template.zero ()
#3  0x0000000000000000 in ?? ()
Thread 2 (Thread 0x7f9073fff700 (LWP 29748)):
#0  runtime.futex () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:269
#1  0x0000000000417717 in runtime.futexsleep () at /usr/lib/golang/src/pkg/runtime/os_linux.c:49
#2  0x000000c2080952f0 in ?? ()
#3  0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7f907b9e1800 (LWP 29744)):
#0  runtime.epollwait () at /usr/lib/golang/src/pkg/runtime/sys_linux_amd64.s:385
#1  0x00000000004175dd in runtime.netpoll () at /usr/lib/golang/src/pkg/runtime/netpoll_epoll.c:78
#2  0x00007fff00000004 in ?? ()
#3  0x00007fff58720fd0 in ?? ()
#4  0xffffffff00000080 in ?? ()
#5  0x0000000000000000 in ?? ()

可以使用“docker exec -it [container-id] bash”命令进入正在运行的container：

[root@localhost ~]# docker exec -it a972e69ab444 bash
bash-4.1# ls
bin   dev  home  lib64  mnt  pam-1.1.1-17.el6.src.rpm  root      sbin     srv  tmp  var
boot  etc  lib   media  opt  proc                      rpmbuild  selinux  sys  usr
bash-4.1#

docker attach相当于复用了container当前使用的tty，因此在docker attach内执行exit，会导致正在运行的container退出。而docker exec会新建立一个tty，在docker exec中执行exit不会导致container退出。

参考资料：
（1）Docker – Enter Running Container with new TTY。

如何杀死一个已经detached的screen会话？

如果想杀死一个已经detached的screen会话，可以使用以下命令：

screen -X -S [session # you want to kill] quit

举例如下：

[root@localhost ~]# screen -ls
There are screens on:
        9975.pts-0.localhost    (Detached)
        4588.pts-3.localhost    (Detached)
2 Sockets in /var/run/screen/S-root.

[root@localhost ~]# screen -X -S 4588 quit
[root@localhost ~]# screen -ls
There is a screen on:
        9975.pts-0.localhost    (Detached)
1 Socket in /var/run/screen/S-root.

可以看到，4588会话已经没有了。

参考资料：
（1）Kill detached screen session。

docker笔记（3）—— selinux导致docker工作不正常

最近几天在研究docker备份文件（操作系统是RHEL7，docker版本是1.5.0）。仿照docker文档，执行如下命令：

[root@localhost data]#docker create -v /dbdata --name dbdata training/postgres /bin/true
[root@localhost data]#docker run -d --volumes-from dbdata --name db1 training/postgres
[root@localhost data]# docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
tar: /backup/backup.tar: Cannot open: Permission denied
tar: Error is not recoverable: exiting now

看到Permission denied这个提示，自然首先怀疑用户没有写权限的问题。检查一下当前目录的权限：

[root@localhost data]# ls -alt
total 4
drwxrwxrwx.  2 root root    6 May  7 21:33 .
drwxrwx-w-. 15 root root 4096 May  7 21:33 ..

应该是没问题的。经过在stackoverflow上的一番讨论，得到的建议是有可能是selinux捣的鬼。查看了一下selinux状态：

[root@localhost root]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   enforcing
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

果断把模式改为permissive:

[root@localhost data]# setenforce 0
[root@localhost data]# sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             targeted
Current mode:                   permissive
Mode from config file:          enforcing
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      28

马上工作正常：

[root@localhost data]# docker run --volumes-from dbdata -v $(pwd):/backup ubuntu tar cvf /backup/backup.tar /dbdata
tar: Removing leading `/' from member names
/dbdata/

因为时间原因，没有往下深究。总之，在使用docker时，要留意一下selinux，有可能会引起很奇怪的问题。

更新：

最近又碰到这个问题，可以参考这篇总结。

参考资料：
（1）Why does docker prompt “Permission denied” when backing up the data volume?；
（2）How to disable SELinux without restart?；
（3）Quick-Tip: Turning off or disabling SELinux。

你用对awk了吗?

今天折腾OmniOS上的awk，结果发现很简单的一个程序居然执行出错：

root@localhost:/root# awk 'function print_name_and_age(name, age) { print name" is "age" old" } {print_name_and_age($1, $2)}'
awk: syntax error near line 1
awk: bailing out near line 1

而相同的程序在gawk下运行的好好的。求助stackoverflow，得到结论如下：

（1）OmniOS上有nawk，/usr/bin/awk（默认的awk）和/usr/xpg4/bin/awk。建议使用nawk。/usr/bin/awk是老版本的awk，很多feature都不支持（基于illumos内核的操作系统，包括Solaris可能都需要注意这个问题）。
（2）“bailing out”是老版本awk的输出日志。所以一旦有这样的日志输出，需要考虑一下是不是使用的awk版本有问题。

处理git clone命令的非标准SSH端口连接

使用git clone命令clone项目时，如果repository的SSH端口不是标准22端口时（例如，SSH tunnel模式，等等），可以使用如下命令：

git clone ssh://git@hostname:port/.../xxx.git

举例如下：

git clone ssh://git@10.137.20.113:2222/root/test.git

使用gitlab docker image搭建git server

这两天折腾了一下gitlab，遇到一些问题，记录一下，以便日后查阅。

（1）gitlab使用的是docker image（下载地址：https://registry.hub.docker.com/u/genezys/gitlab/），这个比较顺利，按README操作即可。

（2）接下来，就是用git client（版本：1.8.3.1）访问git server，这个过程相当痛苦。把结论总结在这里:

a）由于gitlab container的22端口会映射到宿主机的2222端口，所以需要在“~/.ssh/config”文件中加上“Port 2222”这一句；

b）使用“git clone”命令clone项目。其中repository地址可以从gitlab web页面查到。例如：“git@192.168.59.103:root/test.git”，但要注意的是需要把IP改成宿主机的IP。举例，如果宿主机IP是10.137.20.113，则改为“git@10.137.20.113:root/test.git”；

c）最后需要指出的是，如果git clone命令不指定目的文件夹，则默认目的文件夹为xxx.git中的xxx。以上面命令为例，则目的文件夹为test。一定要确保目的文件夹不存在，或者里面没有内容。否则会提示“fatal: destination path 'xxx' already exists and is not an empty directory.”。

P.S. 关于git clone命令访问非标准SSH端口，也可参考我的这篇文章。

参考资料：
1）How can I use git client to access gitlab docker?；
2）How to get Git to clone into current directory。

闲侃CPU（四）

CPU利用率（utilization）是指CPU在一段时间内用于做“有用功”的时间和整个这段时间的百分比值。所谓的“有用功”即CPU没有运行内核（kernel）IDLE线程，而是运行用户级（user-level）应用程序线程，或是其它的内核（kernel）线程，或是处理中断。

CPU用来执行用户级（user-level）应用程序的时间称之为user-time，而运行内核级（kernel-level）程序的时间称之为kernel-time。

计算密集型（computation-intensive）程序也许会把几乎所有的时间用来执行用户级（user-level）程序代码。而I/O密集型（I/O-intensive）程序有相当多的时间用来执行系统调用（system call），这些系统调用将会执行内核代码产生I/O。

当一个CPU利用率达到100％时，称之为饱和（saturated）。在这种情况下，线程在等待获得CPU时，将会面临调度延迟（scheduler latency）的问题。

awk数组笔记

1.awk中的数组是一维数组，使用时不用事先声明。第一次使用数组元素时，会自动生成数组元素的值，默认为空字符串""和数字0。请参考以下例子：

Nan:~ nanxiao$ awk 'END {if (arr["A"] == "") print "Empty string"}'
Empty string
Nan:~ nanxiao$ awk 'END {if (arr["A"] == 0) print "Number 0"}'
Number 0

2.awk中的数组是关联数组（associative array），数组下标为字符串。

3.使用for循环可遍历数组下标：
其中访问数组下标的顺序与具体的实现相关。此外，如果在遍历时加入了新的元素，那么程序运行结果是不确定的。

4.使用subscript in array表达式来判断数组是否包含指定的数组下标。如果array[subscript]存在，表达式返回1，反正返回0。
注意：使用subscript in array不会创建array[subscript]，而if (array[subscript] != "")则会创建array[subscript]（如果array[subscript]不存在的话）。

5.删除数组元素：delete array[subscript]。

6.split(string, array, fs)使用fs作为字段分隔符（field separator），把字符串string拆分后，传到array数组中。第一个字段保存在array["1"]，第二个字段保存在array["2"]…。如果没有指定fs，则使用内置变量FS作为分隔符。

7.多维数组。awk不直接支持多维数组，但可以通过一维数组来模拟。

8.数组元素不能再是数组。

正则表达式笔记

以下笔记来自于《The awk programming language》里《正则表达式》一节。

1.正则表达式的元字符（metacharacters）:
\，^，$，.，[，]，|，{，}，*，+，?。

2.基本的正则表达式形式：
a）普通字符（非元字符），像A，就匹配A自己；
b）转义序列（Escape Sequences）：\b，\f，\n，\r，\t，\ddd和\c（\c表示保留任意字符的字面含义，如\"表示"）。
c）取消特殊含义的元字符，像\*，仅表示*本身；
d）^：匹配字符串的开头；
e）$：匹配字符串的结尾；
f）.：匹配单个字符；
g）character class：[...]；complemented character class：[^...]。举个例子，[ABC]匹配A，B和C中的任意一个字符，而[^0-9]匹配任意一个不是数字的字符。在[...]和[^...]中，除了\，紧随着[出现的^，和在两个字符直间的-这3个字符外，其它字符不再具有特殊含义。如[.]则仅匹配一个.。

3.运算符：
a）|：A｜B表示匹配A或者B；
b）连接：AB表示匹配A后面紧跟着B；
c）A*：匹配0个或多个A；
d）A+：匹配1个或多个A；
e）A?：匹配0个或1个A；
f）(r)：()仅仅是把正则表达式扩起来。

Bash shell进程问题浅析（续）

接上文，再来看一下exec这个bash shell内置命令：

exec [-cl] [-a name] [command [arguments]]
              If command is specified, it replaces the shell.  No new process is created.  The arguments become the arguments to command.  If the -l option is supplied, the shell places a dash at the beginning of the zeroth argument passed to command.  This is what login(1) does.  The -c option causes command  to  be executed with an empty environment.  If -a is supplied, the shell passes name as the zeroth argument to the executed command.  If command cannot be executed for some reason,  a  non-interactive  shell  exits,  unless the shell option execfail is enabled, in which case it returns failure.  An interactive shell returns failure if the file cannot be executed.  If command is not specified,  any  redirections  take  effect in the current shell, and the return status is 0.  If there is a redirection error, the return status is 1.

可以看到，当用exec执行一个命令时，不会产生新的进程，并且这个命令会替换掉当前的bash shell进程。让我们看个例子。在一个终端执行下列命令：

[root@localhost ~]# echo $$
22330
[root@localhost ~]# exec sleep 60

再在另一个终端执行下列命令：

[root@localhost ~]# ps -ef | grep 22330
root     22330 22329  0 05:50 pts/0    00:00:00 sleep 60
root     22361 22345  0 05:52 pts/1    00:00:00 grep --color=auto 22330

可以看到22330号进程变成了sleep 60，而不是bash shell进程了。60秒后，sleep 60进程结束了，终端也退出了：

[root@localhost ~]# ps -ef | grep 22330
root     22363 22345  0 05:56 pts/1    00:00:00 grep --color=auto 22330

最后，通过演示经典的《Shell十三问》中《exec跟source差在哪？》一章结尾的例子，再好好理解一下bash shell进程的相关问题：
1.sh：

#!/bin/bash
A=B
echo "PID for 1.sh before exec/source/fork:$$"
export A
echo "1.sh: \$A is $A"
case $1 in
    exec)
        echo "using exec..."
        exec ./2.sh ;;
    source)
        echo "using source..."
        . ./2.sh ;;
    *)
        echo "using fork by default..."
        ./2.sh ;;
esac
echo "PID for 1.sh after exec/source/fork:$$"
echo "1.sh: \$A is $A"

2.sh:

#!/bin/bash
echo "PID for 2.sh: $$"
echo "2.sh get \$A=$A from 1.sh"
A=C
export A
echo "2.sh: \$A is $A"

（1）执行“./1.sh fork”：

[root@localhost ~]# ./1.sh fork
PID for 1.sh before exec/source/fork:22390
1.sh: $A is B
using fork by default...
PID for 2.sh: 22391
2.sh get $A=B from 1.sh
2.sh: $A is C
PID for 1.sh after exec/source/fork:22390
1.sh: $A is B

可以看到，由于1.sh脚本（进程ID为22390）会新起一个subshell进程去执行2.sh（进程ID为22391），所以在2.sh脚本中对A的修改不会影响到1.sh脚本中A的值。

（2）执行“./1.sh source”：

[root@localhost ~]# ./1.sh source  
PID for 1.sh before exec/source/fork:22393
1.sh: $A is B
using source...
PID for 2.sh: 22393
2.sh get $A=B from 1.sh
2.sh: $A is C
PID for 1.sh after exec/source/fork:22393
1.sh: $A is C

可以看到，由于2.sh脚本会在1.sh脚本进程中运行（打印出的进程ID均为22393），所以在2.sh脚本中对A的修改会影响到1.sh脚本中A的值。

（3）执行“./1.sh exec”：

[root@localhost ~]# ./1.sh exec
PID for 1.sh before exec/source/fork:22396
1.sh: $A is B
using exec...
PID for 2.sh: 22396
2.sh get $A=B from 1.sh
2.sh: $A is C

2.sh脚本会在1.sh脚本进程中运行（打印出的进程ID均为22396），同时原有的1.sh脚本进程不会再运行。所以2.sh脚本运行结束后，不会再执行1.sh脚本的命令。

参考资料：
Shell十三问

2025 年 6 月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30