Technology | Nan Xiao's Blog

The c99 program on Void Linux

Today I found there is a c99 program in Void Linux:

$ c99
cc: fatal error: no input files
compilation terminated.
$ which c99
/usr/bin/c99

Check what it is:

$ file /usr/bin/c99
/usr/bin/c99: POSIX shell script, ASCII text executable
$ cat /usr/bin/c99
#!/bin/sh
exec /usr/bin/cc -std=c99 "$@"

Um, just a shell script which invokes /usr/bin/cc. So check cc program:

$ ll /usr/bin/cc
lrwxrwxrwx 1 root root 3 Jun  9 05:32 /usr/bin/cc -> gcc

Oh, a link to gcc.

AddressSanitizer’s no_sanitize_address attribute

AddressSanitizer has a no_sanitize_address attribute which can be used to turn off instrumentation. Check following code:

$ cat ../main.c
#include <stdlib.h>
#include <sanitizer/asan_interface.h>

#define ARRAY_SIZE 4

int *array;

void
foo()
{
    array[0] = 1;
}

int
main()
{
    array = malloc(sizeof(int) * ARRAY_SIZE);
    ASAN_POISON_MEMORY_REGION(array, sizeof(int) * ARRAY_SIZE);
    foo();
}

Build and run this program:

$ ./main
=================================================================
==1558==ERROR: AddressSanitizer: use-after-poison on address 0x602000000010 at pc 0x55839733c1b7 bp 0x7ffc102fb320 sp 0x7ffc102fb310
WRITE of size 4 at 0x602000000010 thread T0
    #0 0x55839733c1b6 in foo (/home/nan/code-for-my-blog/2020/05/asan_no_sanitize_address/build/main+0x11b6)
    #1 0x55839733c1f2 in main (/home/nan/code-for-my-blog/2020/05/asan_no_sanitize_address/build/main+0x11f2)
    #2 0x7fe4d79d9dea in __libc_start_main ../csu/libc-start.c:308
    #3 0x55839733c0b9 in _start (/home/nan/code-for-my-blog/2020/05/asan_no_sanitize_address/build/main+0x10b9)

0x602000000010 is located 0 bytes inside of 16-byte region [0x602000000010,0x602000000020)
allocated by thread T0 here:
    #0 0x7fe4d7c824c8 in __interceptor_malloc (/usr/lib/libasan.so.5+0x10c4c8)
    #1 0x55839733c1cd in main (/home/nan/code-for-my-blog/2020/05/asan_no_sanitize_address/build/main+0x11cd)
    #2 0x7fe4d79d9dea in __libc_start_main ../csu/libc-start.c:308

SUMMARY: AddressSanitizer: use-after-poison (/home/nan/code-for-my-blog/2020/05/asan_no_sanitize_address/build/main+0x11b6) in foo
Shadow bytes around the buggy address:
  0x0c047fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c047fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c047fff8000: fa fa[f7]f7 fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c047fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==1558==ABORTING

Unsurprisingly, AddressSanitizer reports error because the array is poisoned but foo() is trying to modify its first element. Add __attribute__((no_sanitize_address)) for foo():

__attribute__((no_sanitize_address))
void
foo()
{
    array[0] = 1;
}

Run the program again:

$ ./main
$

This time program exits normally.

P.S., the code can be downloaded here.

Does CPU’s run queue include current running task on illumos?

vmstat is a useful tool to check system performance on Unix Operating Systems. The following is its output from illumos:

# vmstat
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr ro s0 -- --   in   sy   cs us sy id
 1 0 0 1831316 973680 14 38  0  0  1  0 5314 3 3  0  0 1513  472  391 83  9  7

From the manual, the first column kthr:r’ meaning is:

kthr
                 Report the number of kernel threads in each of the three following states:

                 r
                      the number of kernel threads in run queue

So the question is how to define “run queue” here? After some googling, I find this good post:

The reported operating system run queue includes both processes waiting to be serviced and those currently being serviced.

But from another excerpt from Solaris 9 System Monitoring and Tuning:

The column labeled r under the kthr section is the run queue of processes waiting to get on the CPU(s).

Does illumos‘s run queue include both current running and wait-to-run tasks? I can’t get a clear answer (I know illumos is derived from Solaris, but not sure whether the description in Solaris 9 is still applicable for illumos).

I tried to get help from illumos discussion mailing list, and Joshua gave a detailed explanation. Though he didn’t tell me the final answer, I knew where I should delve into:

(1) The statistic of runque is actually from disp_nrunnable (the code is here):

        ......
        /*
        * First count the threads waiting on kpreempt queues in each
        * CPU partition.
        */
        uint_t cpupart_nrunnable = cpupart->cp_kp_queue.disp_nrunnable;
        cpupart->cp_updates++;
        nrunnable += cpupart_nrunnable;
        ......
        /* Now count the per-CPU statistics. */
        uint_t cpu_nrunnable = cp->cpu_disp->disp_nrunnable;
        nrunnable += cpu_nrunnable;
        ......
        if (nrunnable) {
            sysinfo.runque += nrunnable;
            sysinfo.runocc++;
        }

The definition of disp_nrunnable:

/*
 * Dispatch queue structure.
 */
typedef struct _disp {
......
volatile int    disp_nrunnable; /* runnable threads in cpu dispq */
......
} disp_t;

(2) Checked the code related to disp_nrunnable:

/*
 * disp() - find the highest priority thread for this processor to run, and
 * set it in TS_ONPROC state so that resume() can be called to run it.
 */
static kthread_t *
disp()
{
    ......
    dq = &dp->disp_q[pri];
    tp = dq->dq_first;
    ......
    /*
     * Found it so remove it from queue.
     */
    dp->disp_nrunnable--;
    ......
    thread_onproc(tp, cpup);        /* set t_state to TS_ONPROC */
    ......
}

Checked definition of TS_ONPROC:

/*
 * Values that t_state may assume. Note that t_state cannot have more
 * than one of these flags set at a time.
 */
......
#define TS_RUN      0x02    /* Runnable, but not yet on a processor */
#define TS_ONPROC   0x04    /* Thread is being run on a processor */
......

Umm, it became clear: when kernel picks one task to run, removes it from the dispatch queue, decreases disp_nrunnable by 1, and set task’s state as TS_ONPROC. Based on above analysis, illumos‘s run queue should include only wait-to-run tasks, not current running ones.

To verify my thought, I implemented following simple CPU-intensive program:

# cat foo.c
int main(){
    while(1);
}

My virtual machine has only 1 CPU. Use vmstat when it is in idle state:

# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr ro s0 -- --   in   sy   cs us sy id
 1 0 0 1829088 970732 8  22  0  0  0  0 2885 1 2  0  0 1523  261  343 85 10  6
 0 0 0 1827484 968844 16 53  0  0  0  0  0  0  0  0  0 2173  330  285  0 10 90
 0 0 0 1827412 968808 0   1  0  0  0  0  0  0  0  0  0 2177  284  258  0 10 90
 0 0 0 1827412 968808 0   0  0  0  0  0  0  0  0  0  0 2167  296  301  0  9 91
 0 0 0 1827412 968808 0   0  0  0  0  0  0  0  0  0  0 2173  278  298  0  9 91
 0 0 0 1827412 968808 0   0  0  0  0  0  0  0  0  0  0 2173  280  283  0  9 91
 0 0 0 1827412 968808 0   0  0  0  0  0  0  0  0  0  0 2175  279  329  0 10 90
 ......;

kthr:r was 0, and cpu:id (the last column) was more than 90. Launched one instance of foo program:

# ./foo &
[1] 668

Checked vmstat again:

# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr ro s0 -- --   in   sy   cs us sy id
 1 0 0 1829076 970720 8  21  0  0  0  0 2860 1 2  0  0 1528  260  343 84 10  6
 0 0 0 1826220 968100 16 53  0  0  0  0  0  0  0  0  0 1550  334  262 90 10  0
 0 0 0 1826148 968064 0   1  0  0  0  0  0  0  0  0  0 1399  288  264 91  9  0
 0 0 0 1826148 968064 0   0  0  0  0  0  0  0  0  0  0 1283  277  253 92  8  0
 0 0 0 1826148 968064 0   0  0  0  0  0  0  0  0  0  0 1367  281  247 91  9  0
 0 0 0 1826148 968064 0   0  0  0  0  0  0  0  0  0  0 1420  277  239 91  9  0
 0 0 0 1826148 968064 0   0  0  0  0  0  0  0  0  0  0 1371  281  239 91  9  0
 0 0 0 1826148 968064 0   0  0  0  0  0  0  0  0  0  0 1289  278  250 92  8  0
 ......

This time kthr:r was still 0, but cpu:id became 0. Launched another foo:

# ./foo &
[2] 675

Checked vmstat again:

# vmstat 1
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr ro s0 -- --   in   sy   cs us sy id
 1 0 0 1828912 970572 7  20  0  0  0  0 2672 1 1  0  0 1518  244  337 84  9  6
 1 0 0 1825748 967656 16 53  0  0  0  0  0  0  0  0  0 1554  335  284 90 10  0
 1 0 0 1825676 967620 0   1  0  0  0  0  0  0  0  0  0 1497  286  271 90 10  0
 1 0 0 1825676 967620 0   0  0  0  0  0  0  0  0  0  0 1387  309  288 92  8  0
 2 0 0 1825676 967620 0   0  0  0  0  0  0  0  0  0  0 1643  365  291 90 10  0
 1 0 0 1825676 967620 0   0  0  0  0  0  0  0  0  0  0 1446  276  273 91  9  0
 1 0 0 1825676 967620 0   0  0  0  0  0  0  0  0  0  0 1325  645  456 92  8  0
 1 0 0 1825676 967620 0   0  0  0  0  0  0  0  0  0  0 1375  296  300 91  9  0

kthr:r became 1 (yes, it was 2 once). There were 2 CPU-intensive foo program running, whereas kthr:r‘s value was only 1, it means kthr:r excludes the on-CPU task. I also run another foo process, and as expected, kthr:r became 2.

Finally, the test results proved the statement from Solaris 9 System Monitoring and Tuning is right:

The column labeled r under the kthr section is the run queue of processes waiting to get on the CPU(s).

A minimal manual of contributing to illumos

Compared to fire-and-forget mode of submitting patch to OpenBSD, i.e., you just need to copy-and-paste .diff file into mail and send itto the tech@openbsd.org, contributing to illumos seems a little complicated (This is the detailed document). How about I just found a typo and want to submit a patch? This post will guide you step by step.

prerequisites: illumos project uses git as the version control tool. Though its source code is also hosted in github, illumos project doesn’t accept Pull Request directly. You must follow below steps.

(1) Every commit should be associated with an issue, if you found an issue no one noticed before, just create an issue in bug-tracker and assign it to yourself (An issue example).

(2) Clone project, modify code, compile and do some tests. Once it’s done, upload the .diff file to Review Board (A Review Request example) and shoot a mail into developer@lists.illumos.org to invite other people to review it (A review mail example).

(3) Once reviewers think your code is ready, you can follow this guide to prepare patch and send RTI mail to advocates@lists.illumos.org (A RTI example). Depends on the situation, you may also need to further polish your code. Once the patch meets the quality requirements, congratulations, your commit will be merged, like this one.

Happy contributing to illumos!

Close pcap_t handle once pcap_dump_open succeeds

To create a pcap file, you need to get pcap_t* and pcap_dumper_t* pointers first:

    ......
    pcap_t *dst_handle = pcap_open_dead(DLT_EN10MB, 262144);
    pcap_dumper_t *dst_dump = pcap_dump_open(dst_handle, "output.pcap");
    ......

In fact, the pcap_t* is only used for pcap_dump_open(), i.e., create pcap file header; the following operations, such as pcap_dump(), won’t use it. It means if pcap_dump_open() succeeds, the pcap_t* can be closed right away:

    ......    
    pcap_t *dst_handle = pcap_open_dead(DLT_EN10MB, 262144);
    pcap_dumper_t *dst_dump = pcap_dump_open(dst_handle, "output.pcap");
    if (dst_dump == NULL) {
        printf("pcap_dump_open error: %s\n", pcap_geterr(dst_handle));
        pcap_close(dst_handle);
        return EXIT_FAILURE;
    }
    pcap_close(dst_handle);
    ......

The same applies for pcap_dump_fopen().