技术 | 我的站点

从函数式编程角度思考awk

昨天读到这篇文章：Awk, Unix, and functional programming，作者从函数式编程角度考虑awk，把awk总结为一个函数式程序：

Awk(action) =
    for each file
      for each input line
        for each pattern
          if pattern matches input line
            do action(fields)

即把awk程序看做一个函数，action作为awk的参数。对符合pattern的输入行，调用action处理这一行的每个field。上面这段伪代码可以帮助我更好地理解awk。

Haskell笔记（15）—— Type annotation

当需要显示地指定表达式类型时，可以使用type annotaion，即在表达式后面加上::和需要指定的类型。举例如下：

> :type 1 :: Int
1 :: Int :: Int

可以看到1被强制指定为Int类型。Type annotation也可以用来获得Int等类型的边界值：

> minBound :: Int
-9223372036854775808
> maxBound :: Int
9223372036854775807

参考资料：
Lecture Notes on Haskell Programming。

Haskell笔记（14）—— List comprehension

在数学中，comprehension可以表示为从一个集合生成另一个集合：

{x²  |  x ∈ {1...5}}

在Haskell中，list comprehension可以表示从一个list生成另外一个list：

> [x^2 | x <- [1 .. 10], even x]
[4,16,36,64,100]

list comprehension可以包含两部分：x <- [1 .. 10]是generator，表明x的值从哪里获得；even x是guard，相当于限制哪些x的值可以用于生成新的list。Generator和guard都可以有多个，用,分开：

> [x * y | x <- [1 .. 10], y <- [1 .. 10], even x, odd y]
[2,6,10,14,18,4,12,20,28,36,6,18,30,42,54,8,24,40,56,72,10,30,50,70,90]

需要注意的是，改变generator的顺序会改变最后生成list的顺序。多个generator像嵌套循环，位置靠后的是里层循环，位置靠前的是外层循环。举例如下：

> [(x, y) | x <- [1, 2, 3], y <- [4, 5]]
[(1,4),(1,5),(2,4),(2,5),(3,4),(3,5)]
> [(x, y) | y <- [4, 5], x <- [1, 2, 3]]
[(1,4),(2,4),(3,4),(1,5),(2,5),(3,5)]

参考资料：
List Comprehensions。

SmartOS上使用pkgin安装软件

我使用的SmartOS是安装在Vmware上的虚拟机，网络模式是NAT。装好的SmartOS缺少很多常用软件，需要自行安装：

（1）由于我的网络使用了proxy，所以需要配置一下：

export http_proxy="http://web-proxy.xxxxxx.com:8080/"
export https_proxy="https://web-proxy.xxxxxx.com:8080/"

（2）安装pkgin，我使用的是root用户：

# cd /
# curl -k http://pkgsrc.joyent.com/packages/SmartOS/bootstrap/bootstrap-2015Q4-x86_64.tar.gz | gzcat | tar -xf -
# pkg_admin rebuild
# pkgin -y up

（3）接下来就可以安装软件了，以gcc为例。首先查找gcc package：

# pkgin se gcc
......
gcc49-4.9.3          The GNU Compiler Collection (GCC) - 4.9 Release Series
......

安装gcc：

# pkgin in gcc49-4.9.3

安装好后就可以使用了：

# gcc
gcc: fatal error: no input files
compilation terminated.

参考资料：
Installing pkgin；
Working with packages。

TPC-C简介

本文摘自BENCHMARKING TRANSACTION DATABASES。

TPC-C benchmark是Transaction Processing Council制定测试数据库benchmark标准，它是基于批发供应商（也称之为company）的模型：

每个warehouse拥有100000个商品；
每个warehouse供应10个地区；
每个地区服务3000个客户。

TPC-C benchmark会增加warehouse，而保持其它常量不变。TPC-C benchmark的度量标准是tpm-C，即每分钟处理的transaction。HammerDB就是一个提供TPC-C benchmark测试的开源工具。

Sysdig笔记（2）——sysdig的输出日志

在命令行执行sysdig命令，得到下列输出日志：

# sysdig | more
8 11:04:39.920906090 2 <NA> (0) > switch next=4606(qemu-kvm) pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
9 11:04:39.920923972 2 qemu-kvm (4606) < ioctl res=0
10 11:04:39.920927878 2 qemu-kvm (4606) > ioctl fd=17(<X>) request=AE80 argument=0
11 11:04:39.920933865 2 qemu-kvm (4606) < ioctl res=0
12 11:04:39.920934920 2 qemu-kvm (4606) > ioctl fd=17(<X>) request=AE80 argument=0
21 11:04:39.920950032 2 qemu-kvm (4606) < ioctl res=0
22 11:04:39.920951238 2 qemu-kvm (4606) > ioctl fd=17(<X>) request=AE80 argument=0
24 11:04:39.920958802 2 qemu-kvm (4606) > switch next=0 pgft_maj=0 pgft_min=930 vm_size=1534620 vm_rss=1083932 vm_swap=0
500 11:04:39.923348311 1 <NA> (0) > switch next=17 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
501 11:04:39.923351955 1 <NA> (17) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
504 11:04:39.923380189 7 <NA> (0) > switch next=22 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0
507 11:04:39.923394983 7 <NA> (22) > switch next=0 pgft_maj=0 pgft_min=0 vm_size=0 vm_rss=0 vm_swap=0

第一列是事件序号，它是增长的，不连续的原因是因为没有包含sysdig自身产生的事件（可以使用sysdig -D得到sysdig自身产生的事件）；
第二列是发生事件的时间戳；
第三列是CPU ID；
第四列是命令；
第五列是线程ID；
第六列是事件方向，比如进入ioctl函数为>，离开为<；
第七列是事件名称（比如ioctl）；
第八列是事件参数。

参考资料：
Interpreting Sysdig Output；
How to understand evt.num?。

Sysdig笔记（1）——编译debug版本sysdig

编译debug版本sysdig方法：

cd build
cmake -DCMAKE_BUILD_TYPE=Debug..

Oracle笔记（1）——Database和Instance的区别

Database和Instance的区别如下：

What is a Database?

We already know that a database is a collection of data. And this data is stored in form of tables at logical level, and in the datafiles at the physical level. There are some other files as well like Redo log files, Control files, Initialization files which stores important information about the database.

What is an Instance?

To view or update data stored in tables/datafiles, Oracle must start a set of background processes, and must allocate some memory to be used during database operation. The background processes and memory allocated by Oracle together make up an Instance.

简言之，Database是存放实际数据的所有文件。为了操作这些文件，就需要有Instance：访问文件的进程和内存。

Oracle 12c For Dummies中关于Database和Instance二者之间关系的总结：

✓ An instance can exist without a database. Yes, it’s true. You can start an Oracle instance and not have it access any database files. Why would you do this?
• This is how you create a database. There’s no chicken-or-egg debate here. You first must start an Oracle instance; you create the database from within the instance.
• An Oracle feature called Automatic Storage Management uses an instance but isn’t associated with a database.
✓ A database can exist without an instance but would be useless. It’s just a bunch of magnetic blips on the hard drive.
✓ An instance can access only one database. When you start your instance, the next step is to mount that instance to a database. An instance can mount only one database at a time.
✓ You can set up multiple instances to access the same set of files or one database. Clustering is the basis for the Oracle Real Application Clusters feature. Many instances on several servers accessing one central data- base allows for scalability and high availability.

参考资料：

Oracle 12c For Dummies；

database vs instances；

Difference between Oracle Instance & Database.。

Bash quoting可以关闭Bash中具有特殊含义的meta字符的功能：
a）单引号：所有meta字符的功能均被关闭；
b）双引号：大部分meta字符的功能被关闭，除了$等少数字符；
c）反斜线（\）：仅跟着\后面的meta字符被关闭。
这样就可以理解为什么解压多个zip文件时，要使用“unzip '*.zip'”而不是“unzip *.zip”。因为第二种会首先把*.zip替换成所有的文件名，而第一种方法不会这样做。

参考资料：
Shell十三问；
How do I unzip multiple / many files under Linux?。

分类：技术