dmesg intro

Because dmesg saved me from an installing CUDA driver issue this month, I decide to write this short post to introduce it. According to Wikipedia:

dmesg (display message or driver message) is a command on most Unix-like operating systems that prints the message buffer of the kernel.[1] The output of this command typically contains the messages produced by the device drivers.

On Unix/Linux systems, kernel, kernel modules (e.g., device drivers), and even user-space processes may output logs in kernel buffer. So dmesg is a neat tool for debugging purpose (please refer Linux Performance Analysis in 60,000 Milliseconds). Compared to BSD families, dmesg on Linux provides more options. So this post will use dmesg on Linux as an example.

Firstly, how to know the underlying kernel buffer size? It depends on CONFIG_LOG_BUF_SHIFT. On my Arch Linux:

$ zgrep CONFIG_LOG_BUF_SHIFT /proc/config.gz
CONFIG_LOG_BUF_SHIFT=17

It means the buffer’s size is 2 ^ 17 = 128 KiB (please refer How to find out a linux kernel ring buffer size? and default kernel .config file).

Secondly, dmesg on Linux supports many handy options. E.g. -H is used to display in human-readable format:

$ dmesg -H
[Apr15 09:26] Linux version 5.0.5-arch1-1-ARCH (builduser@heftig-17705) (gcc version 8.2.1 20181127 (GCC)) #1 SMP PREE>
[  +0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=5bee8879-dca0-4b3a-8abd-1bdf96e17a1f rw quiet
[  +0.000000] KERNEL supported cpus:
[  +0.000000]   Intel GenuineIntel
[  +0.000000]   AMD AuthenticAMD
[  +0.000000]   Hygon HygonGenuine
[  +0.000000]   Centaur CentaurHauls
......

and -k tells dmesg only shows user-space messages:

$ dmesg -u -H
[Apr15 09:26] systemd[1]: systemd 241.67-1-arch running in system mode. (+PAM +AUDIT -SELINUX -IMA -APPARMOR +SMACK -S>
[  +0.021954] systemd[1]: Detected architecture x86-64.
[  +0.021551] systemd[1]: Set hostname to <tesla-p100>.
[  +0.098510] systemd[1]: Listening on Journal Socket (/dev/log).
[  +0.000203] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point.
[  +0.000032] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[  +0.000047] systemd[1]: Listening on udev Control Socket.
[  +0.001751] systemd[1]: Listening on Process Core Dump Socket.
[  +0.000010] systemd[1]: Reached target System Time Synchronized.
[  +0.000050] systemd[1]: Listening on Journal Socket.
[  +0.273822] systemd-journald[1148]: Received request to flush runtime journal from PID 1
(END)

To know about other options, please refer manual.

Last but not least, dmesg‘s source code is a good place to learn how to develop command line applications on Linux.

Put dmesg into your toolkit, maybe it will save you one day too.

 

What I learn from Practical Binary Analysis as a non-reverse-engineering engineer?

I spent the past two weeks in reading Practical Binary Analysis. Since I am not a professional reverse engineer, I glossed over the “Part III: Advanced Binary Analysis”, so I only read half the book. Even though, I still get a big gain:

(1) Know better of ELF file. On *nix Operating system, ELF file is everywhere: executable file, object file, shared library and coredump. “Chapter 2: The ELF format” gives me a clear explanation of the composition of ELF. E.g., I know why some functions have “@plt” suffix when using gdb to debug it.

(2) Master many tricks about GNU Binutils. GNU Binutils is a toolbox which provides versatile command line programs to analyze ELF files. Literally it relies heavily on BFD library. I also get some sense about how to use BFD library to tweak ELF files.

(3) “Appendix A: A crash course on X86 assembly” is a good tutorial for refreshing X86 architecture and assembly language.

(4) Others: E.g., I understand how to use LD_PRELOAD environmental variable and dynamic linking functions to manipulate shared library.

All in all, if you are working on *nix (although this book is based on Linux, I think most knowledge are also applicable to other *nix), you should try to read this book. I promise it is not a waste of time and you can always learn something, believe me!

The display format of Bash’s built-in time command

Check the default output format of bash‘s built-in time command:

# time

real    0m0.000s
user    0m0.000s
sys     0m0.000s

You can use -p option to output in POSIX format:

# time -p
real 0.00
user 0.00
sys 0.00

From bash source code, we know the the definitions of these two formats:

#define POSIX_TIMEFORMAT "real %2R\nuser %2U\nsys %2S"
#define BASH_TIMEFORMAT  "\nreal\t%3lR\nuser\t%3lU\nsys\t%3lS"

To decipher the meanings of them, we need to refer bash manual:

TIMEFORMAT

The value of this parameter is used as a format string specifying how the timing information for pipelines prefixed with the time reserved word should be displayed. The ‘%’ character introduces an escape sequence that is expanded to a time value or other information. The escape sequences and their meanings are as follows; the braces denote optional portions.

%%
A literal ‘%’.

%[p][l]R
The elapsed time in seconds.

%[p][l]U
The number of CPU seconds spent in user mode.

%[p][l]S
The number of CPU seconds spent in system mode.

%P
The CPU percentage, computed as (%U + %S) / %R.

The optional p is a digit specifying the precision, the number of fractional digits after a decimal point. A value of 0 causes no decimal point or fraction to be output. At most three places after the decimal point may be specified; values of p greater than 3 are changed to 3. If p is not specified, the value 3 is used.

The optional l specifies a longer format, including minutes, of the form MMmSS.FFs. The value of p determines whether or not the fraction is included.
……

Take POSIX_TIMEFORMAT as an example: %2R denotes using second as time unit, and the precision is two digits after a decimal point; %2U and %2S are similar.

Now you can comprehend the output of time, correct? Try using BASH_TIMEFORMAT as a practice.

 

Conditional variable takeaways

Conditional variable is a common concept in both user-space and kernel-space programming. IMHO, it is a little complicated synchronization mechanism. Recently, I came across Measuring context switching and memory overheads for Linux threads, and this article provides an example which I think is a good tutorial about how to understand and use conditional variable.

The parent thread code is like following:

  /* parent thread */
  pthread_mutex_lock(&si.mutex);
  pthread_t childt;
  pthread_create(&childt, NULL, threadfunc, (void*)&si);

  // Each iteration of this loop will switch context from the parent to the
  // child and back - two context switches. The child signals first.
  ......
  for (int i = 0; i < NUM_ITERATIONS; ++i) {
    pthread_cond_wait(&si.cv, &si.mutex);
    pthread_cond_signal(&si.cv);
  }
  pthread_mutex_unlock(&si.mutex);

And this is the child thread code:

// The child thread signals first
  pthread_mutex_lock(&si->mutex);
  pthread_cond_signal(&si->cv);
  for (int i = 0; i < NUM_ITERATIONS; ++i) {
    pthread_cond_wait(&si->cv, &si->mutex);
    pthread_cond_signal(&si->cv);
  }
  pthread_mutex_unlock(&si->mutex);

(1) The first takeaway is pthread_cond_signal() must be called after pthread_cond_wait() in timing sequence; otherwise the signal won’t be received.

Check preceding code, before launching child thread:

    ......
    pthread_t childt;
    pthread_create(&childt, NULL, threadfunc, (void*)&si);
    ......

The parent thread must hold mutex first:

    ......
    pthread_mutex_lock(&si.mutex);
    ......

Then in the first iteration of loop, release the mutex and wait for notification:

    ......
    pthread_cond_wait(&si.cv, &si.mutex);
    ......

This can guarantee when child thread sends signal, the parent thread is already in the wait queue:

  ......
  pthread_mutex_lock(&si->mutex);
  pthread_cond_signal(&si->cv);
  ......

(2) The other thing we should remember is before and after calling pthread_cond_wait(), the current thread must hold the mutex. I.e., before callingpthread_cond_wait(), the current thread get the mutex, then in pthread_cond_wait(), put the current thread in the wait queue and release the mutexatomically. Once another thread signals current thread, it will reacquire mutex and return from pthread_cond_wait().

Forgetting “-pthread” option may give you a big surprise!

Today, I wrote a small pthread program to do some testing:

#include <pthread.h>

int main(void)
{
        pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
        pthread_cond_t cv = PTHREAD_COND_INITIALIZER;

        pthread_mutex_lock(&mutex);
        pthread_cond_wait(&cv, &mutex);
        return 0;
} 

Build and test it on OpenBSD-current (version is 6.4):

# cc cv_test.c -o cv_test
# ./cv_test

The program will block there and it is my expected result. Switch to Arch Linux (kernel version is 4.18.9):

# cc cv_test.c -o cv_test
# ./cv_test
#

The program will exit immediately. I doubt it is “spurious awake” firstly, but can’t get a convincing explanation. Using ldd to check program. On OpenBSD:

# ldd cv_test
cv_test:
        Start            End              Type  Open Ref GrpRef Name
        000000d4c3a00000 000000d4c3c02000 exe   1    0   0      cv_test
        000000d6e6007000 000000d6e62f6000 rlib  0    1   0      /usr/lib/libc.so.92.5
        000000d6db100000 000000d6db100000 ld.so 0    1   0      /usr/libexec/ld.so

On Arch Linux:

# ldd cv_test
        linux-vdso.so.1 (0x00007ffde91c6000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f3e3169b000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f3e3187a000)

Nothing special. After seeking help on stackoverflow, the answer is I need adding -pthread option:

# cc -pthread cv_test.c -o cv_test
# ./cv_test

This time it worked perfectly. Checking linked library:

# ldd cv_test
        linux-vdso.so.1 (0x00007fff48be8000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fa46f84c000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fa46f688000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fa46f888000)

Why doesn’t Linux give me a link error which prompts I need link libpthread? It seems not make sense.