The byproducts of reading OpenBSD netcat code

When I took part in a training last year, I heard about netcat for the first time. During that class, the tutor showed some hacks and tricks of using netcat which appealed to me and motivated me to learn the guts of it. Fortunately, in the past 2 months, I was not so busy that I can spend my spare time to dive into OpenBSD‘s netcat source code, and got abundant byproducts during this process.

(1) Brush up socket programming. I wrote my first network application more than 10 years ago, and always think the socket APIs are marvelous. Just ~10 functions (socket, bind, listen, accept…) with some IO multiplexing buddies (select, poll, epoll…) connect the whole world, wonderful! From that time, I developed a habit that is when touching a new programming language, network programming is an essential exercise. Even though I don’t write socket related code now, reading netcat socket code indeed refresh my knowledge and teach me new stuff.

(2) Write a tutorial about netcat. I am mediocre programmer and will forget things when I don’t use it for a long time. So I just take notes of what I think is useful. IMHO, this “tutorial” doesn’t really mean teach others something, but just a journal which I can refer when I need in the future.

(3) Submit patches to netcat. During reading code, I also found bugs and some enhancements. Though trivial contributions toOpenBSD, I am still happy and enjoy it.

(4) Implement a C++ encapsulation of libtls. OpenBSD‘s netcat supports tls/ssl connection, but it needs you take full care of resource management (memory, socket, etc), otherwise a small mistake can lead to resource leak which is fatal for long-live applications (In fact, the two bugs I reported to OpenBSD are all related resource leak). Therefore I develop a simple C++ library which wraps the libtls and hope it can free developer from this troublesome problem and put more energy in application logic part.

Long story to short, reading classical source code is a rewarding process, and you can consider to try it yourself.

Conditional variable takeaways

Conditional variable is a common concept in both user-space and kernel-space programming. IMHO, it is a little complicated synchronization mechanism. Recently, I came across Measuring context switching and memory overheads for Linux threads, and this article provides an example which I think is a good tutorial about how to understand and use conditional variable.

The parent thread code is like following:

  /* parent thread */
  pthread_mutex_lock(&si.mutex);
  pthread_t childt;
  pthread_create(&childt, NULL, threadfunc, (void*)&si);

  // Each iteration of this loop will switch context from the parent to the
  // child and back - two context switches. The child signals first.
  ......
  for (int i = 0; i < NUM_ITERATIONS; ++i) {
    pthread_cond_wait(&si.cv, &si.mutex);
    pthread_cond_signal(&si.cv);
  }
  pthread_mutex_unlock(&si.mutex);

And this is the child thread code:

// The child thread signals first
  pthread_mutex_lock(&si->mutex);
  pthread_cond_signal(&si->cv);
  for (int i = 0; i < NUM_ITERATIONS; ++i) {
    pthread_cond_wait(&si->cv, &si->mutex);
    pthread_cond_signal(&si->cv);
  }
  pthread_mutex_unlock(&si->mutex);

(1) The first takeaway is pthread_cond_signal() must be called after pthread_cond_wait() in timing sequence; otherwise the signal won’t be received.

Check preceding code, before launching child thread:

    ......
    pthread_t childt;
    pthread_create(&childt, NULL, threadfunc, (void*)&si);
    ......

The parent thread must hold mutex first:

    ......
    pthread_mutex_lock(&si.mutex);
    ......

Then in the first iteration of loop, release the mutex and wait for notification:

    ......
    pthread_cond_wait(&si.cv, &si.mutex);
    ......

This can guarantee when child thread sends signal, the parent thread is already in the wait queue:

  ......
  pthread_mutex_lock(&si->mutex);
  pthread_cond_signal(&si->cv);
  ......

(2) The other thing we should remember is before and after calling pthread_cond_wait(), the current thread must hold the mutex. I.e., before callingpthread_cond_wait(), the current thread get the mutex, then in pthread_cond_wait(), put the current thread in the wait queue and release the mutexatomically. Once another thread signals current thread, it will reacquire mutex and return from pthread_cond_wait().

The Dilemma brought by “outdated technology”

This week, I came across an interesting post: Do You Know Cobol? If So, There Might Be a Job for You. The general idea is many big finance companies still use the systems that are developed in Cobol, which is an ancient programming language. Currently, there are few people who master Cobol, and even worse, most of them will or already retire. IMHO, this article gives an good example about companies and personal engineers’ dilemma brought by “outdated technology”.

Let me tell two stories first:
(1) I once worked in HPE for nearly two years, and HPE has its own Unix Operating System: HP-UX. Actually, my team did HP-UX related work before I joined in, but at that time, all team’s work was already switched to Linux. Why? Because Linux had dominated the servers market then. To earn money, more and more resource should be invested in Linux area, and for HP-UX, basic maintenance and development is enough. As fat as I know, HP-UX is still the backbone of many critical services, such as banks, telecommunication Operators, etc. But even for HPE itself, HP-UX‘s priority becomes very lower now.

(2) There is a service which was launched in mid-1990s. It is a 32-bit program, written in C programming language, and stable enough to serve the people all over the world every day. About ~20 years later, one engineer noticed Year_2038_problem because the program definitely use time_t which is 32-bit long. He began to discuss with leader to transform the program to 64-bit, but both of them knew it was not as simple as adding only -m64 compile option. After ~20 years, the code had become “mature”, what I mean is the code repository was very large; about ~40 engineers had ever committed code, and some modules had changed into a total “black box”, what I mean is no one knew the logic behind it, but it really worked as a charm! To transform it into 64-bit program, maybe the compilation can pass, but no one know whether it indeed work! It needs careful code review and sufficient testing, but seems not worth cause it is a problem which will occur ~20 years later. Therefore, this task lies in “Todo” list year by year. Every one pray the service should be shut down by 2038.

These examples all narrate one fact, most “outdated technology” are not too “bad”, such as Cobol, HP-UX or 32-bit program, but for some reasons, they are not main-steam now. For most companies, the overhaul of services which are constructed by these “outdated technology” is not accepted: besides the notable time & person cost, one glitch can bring catastrophic result, even can let company close. But at the other side, the amount of engineers who master these “outdated technology” also become smaller and smaller, so the companies can only explore the potentials from their internal staffs mostly.

For engineers, there is also a dilemma. You can pick “outdated technology” as a hobby, but there is a huge risk to adopt it as a full-time job. Maybe one day, you need to find job again, and many biased companies will reject you just because they deem you don’t know some “hyped technology”, and can only tame dinosaurs. Ridiculous! Isn’t it? But it is the reality we must adapt to.

Forgetting “-pthread” option may give you a big surprise!

Today, I wrote a small pthread program to do some testing:

#include <pthread.h>

int main(void)
{
        pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
        pthread_cond_t cv = PTHREAD_COND_INITIALIZER;

        pthread_mutex_lock(&mutex);
        pthread_cond_wait(&cv, &mutex);
        return 0;
} 

Build and test it on OpenBSD-current (version is 6.4):

# cc cv_test.c -o cv_test
# ./cv_test

The program will block there and it is my expected result. Switch to Arch Linux (kernel version is 4.18.9):

# cc cv_test.c -o cv_test
# ./cv_test
#

The program will exit immediately. I doubt it is “spurious awake” firstly, but can’t get a convincing explanation. Using ldd to check program. On OpenBSD:

# ldd cv_test
cv_test:
        Start            End              Type  Open Ref GrpRef Name
        000000d4c3a00000 000000d4c3c02000 exe   1    0   0      cv_test
        000000d6e6007000 000000d6e62f6000 rlib  0    1   0      /usr/lib/libc.so.92.5
        000000d6db100000 000000d6db100000 ld.so 0    1   0      /usr/libexec/ld.so

On Arch Linux:

# ldd cv_test
        linux-vdso.so.1 (0x00007ffde91c6000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f3e3169b000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007f3e3187a000)

Nothing special. After seeking help on stackoverflow, the answer is I need adding -pthread option:

# cc -pthread cv_test.c -o cv_test
# ./cv_test

This time it worked perfectly. Checking linked library:

# ldd cv_test
        linux-vdso.so.1 (0x00007fff48be8000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007fa46f84c000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007fa46f688000)
        /lib64/ld-linux-x86-64.so.2 => /usr/lib64/ld-linux-x86-64.so.2 (0x00007fa46f888000)

Why doesn’t Linux give me a link error which prompts I need link libpthread? It seems not make sense.

First taste of MPI

Different with OpenMP which focuses on multiple threads in one process, MPI defines how multiple processes can collaborate with each other. In this post, I use Open MPI on Arch Linux to do a simple test.

The “Hello World” program is from here, build and run it on one node, not a cluster containing many nodes:

$ mpirun mpi_hello_world
Hello world from processor tesla-p100, rank 16 out of 52 processors
Hello world from processor tesla-p100, rank 34 out of 52 processors
Hello world from processor tesla-p100, rank 35 out of 52 processors
......

Check the CPU information:

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              104
On-line CPU(s) list: 0-103
Thread(s) per core:  2
Core(s) per socket:  26
Socket(s):           2
NUMA node(s):        2
......

Although there are 2 physical CPUs in the system, the mpirun only utilizes 1 CPU. Modify the program to output process ID:

......
#include <unistd.h>
......
printf("Hello world from process %d, processor %s, rank %d out of %d processors\n",
         getpid(), processor_name, world_rank, world_size);
......

This time you can make sure different processes are spawned:

$ mpirun mpi_hello_world
Hello world from process 52528, processor tesla-p100, rank 21 out of 52 processors
Hello world from process 52557, processor tesla-p100, rank 31 out of 52 processors
Hello world from process 52597, processor tesla-p100, rank 43 out of 52 processors
......

P.S., if you run mpirun as root, please add --allow-run-as-root option:

# mpirun --allow-run-as-root mpi_hello_world