OpenBSD saves me again! — Debug a memory corruption issue

Yesterday, I came across a third-part library issue, which crashes at allocating memory:

......
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f594a5a9b6b in _int_malloc () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007f594a5a9b6b in _int_malloc () from /usr/lib/libc.so.6
#1  0x00007f594a5ab503 in malloc () from /usr/lib/libc.so.6
#2  0x00007f594b13f159 in operator new (sz=5767168) at /build/gcc/src/gcc/libstdc++-v3/libsupc++/new_op.cc:50
......

It is obvious that the memory tags are corrupted, but who is the murder? Since the library involves a lot of maths computation, it is not an easy task to grasp the code quickly. So I need to find another way:

(1) Open all warnings during compilation: -Wall. Nothing found.

(2) Use valgrind, but unfortunately, valgrind crashes itself:

......
valgrind: the 'impossible' happened:
   Killed by fatal signal

host stacktrace:
==43326==    at 0x58053139: get_bszB_as_is (m_mallocfree.c:303)
==43326==    by 0x58053139: get_bszB (m_mallocfree.c:315)
==43326==    by 0x58053139: vgPlain_arena_malloc (m_mallocfree.c:1799)
==43326==    by 0x5800BA84: vgMemCheck_new_block (mc_malloc_wrappers.c:372)
==43326==    by 0x5800BD39: vgMemCheck___builtin_vec_new (mc_malloc_wrappers.c:427)
==43326==    by 0x5809F785: do_client_request (scheduler.c:1866)
==43326==    by 0x5809F785: vgPlain_scheduler (scheduler.c:1433)
==43326==    by 0x580AED50: thread_wrapper (syswrap-linux.c:103)
==43326==    by 0x580AED50: run_a_thread_NORETURN (syswrap-linux.c:156)

sched status:
  running_tid=1
......

(3) Change compiler, use clang instead of gcc, and hope it can give me some clues. Still no effect.

(4) Switch Operating System from Linux to OpenBSD, the program crashes again. But this time, it tells me where the memory corruption occurs:

......
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000014b07f01e52d in addMod (r=<error reading variable>, a=4693443247995522, b=28622907746665631,
......

I figure out the issue quickly, and not bother to understand the whole code. OpenBSD saves me again, thanks!

 

Reflection on one-year usage of OpenBSD

I have used OpenBSD for more than one year, and it is time to give a summary of the experience:

(1) What do I get from OpenBSD?

a) A good UNIX tutorial. When I am curious about some UNIXcommands’ implementation, I will refer to OpenBSD source code, and I actually gain something every time. E.g., refresh socket programming skills from nc; know how to process file efficiently from cat.

b) A better test bed. Although my work focus on developing programs on Linux, I will try to compile and run applications on OpenBSD if it is possible. One reason is OpenBSD usually gives more helpful warnings. E.g., hint like this:

......
warning: sprintf() is often misused, please use snprintf()
......

Or you can refer this post which I wrote before. The other is sometimes program run well on Linux may crash on OpenBSD, and OpenBSD can help you find hidden bugs.

c) Some handy tools. E.g. I find tcpbench is useful, so I ported it into Linuxfor my own usage (project is here).

(2) What I give back to OpenBSD?

a) Patches. Although most of them are trivial modifications, they are still my contributions.

b) Write blog posts to share experience about using OpenBSD.

c) Develop programs for OpenBSD/*BSD: lscpu and free.

d) Porting programs into OpenBSD: E.g., I find google/benchmark is a nifty tool, but lacks OpenBSD support, I submitted PR and it is accepted. So you can use google/benchmark on OpenBSD now.

Generally speaking, the time invested on OpenBSD is rewarding. If you are still hesitating, why not give a shot?

Use boost on OpenBSD

Installing boost on OpenBSD is simple:

$ pkg_add boost

Write a simple test program:

#include <boost/asio.hpp>
#include <iostream>

int main()
{
    boost::system::error_code ec{};
    auto server_addr{boost::asio::ip::make_address("127.0.0.1", ec)};
    if (ec.value())
    {
        std::cerr << "Failed to parse the IP address. Error code = "
                    << ec.value() << ". Message: " << ec.message();
        return ec.value();
    }

    boost::asio::ip::tcp::endpoint ep{server_addr, 3003};
    return 0;
}

Compile it:

$ c++ client.cpp
/tmp/client-238877.o: In function `boost::asio::error::get_system_category()':
client.cpp:(.text._ZN5boost4asio5error19get_system_categoryEv[_ZN5boost4asio5error19get_system_categoryEv]+0x5): undefined reference to `boost::system::system_category()'
/tmp/client-238877.o: In function `boost::system::error_code::error_code()':
client.cpp:(.text._ZN5boost6system10error_codeC2Ev[_ZN5boost6system10error_codeC2Ev]+0x1b): undefined reference to `boost::system::system_category()'
/tmp/client-238877.o: In function `boost::system::error_category::std_category::equivalent(int, std::__1::error_condition const&) const':
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE[_ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE]+0x129): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE[_ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE]+0x16a): undefined reference to `boost::system::generic_category()'
/tmp/client-238877.o: In function `boost::system::error_category::std_category::equivalent(std::__1::error_code const&, int) const':
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x137): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x178): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x2d2): undefined reference to `boost::system::generic_category()'
c++: error: linker command failed with exit code 1 (use -v to see invocation)

Whoops!, it means linker can’t find related library. boost libraries are installed in /usr/local/lib:

$ ls -lt /usr/local/lib/libboost*
-rw-r--r--  1 root  bin  2632774 Jul  2 05:58 /usr/local/lib/libboost_regex-mt.a
-rw-r--r--  1 root  bin  1398613 Jul  2 05:58 /usr/local/lib/libboost_regex-mt.so.8.0
-rw-r--r--  1 root  bin  2632774 Jul  2 05:58 /usr/local/lib/libboost_regex.a
-rw-r--r--  1 root  bin  1398613 Jul  2 05:58 /usr/local/lib/libboost_regex.so.8.0
-rw-r--r--  1 root  bin   994564 Jul  2 05:58 /usr/local/lib/libboost_serialization-mt.a
-rw-r--r--  1 root  bin   484918 Jul  2 05:58 /usr/local/lib/libboost_serialization-mt.so.8.0
-rw-r--r--  1 root  bin   994564 Jul  2 05:58 /usr/local/lib/libboost_serialization.a
-rw-r--r--  1 root  bin   484918 Jul  2 05:58 /usr/local/lib/libboost_serialization.so.8.0
-rw-r--r--  1 root  bin   260322 Jul  2 05:58 /usr/local/lib/libboost_signals-mt.a
-rw-r--r--  1 root  bin   154973 Jul  2 05:58 /usr/local/lib/libboost_signals-mt.so.8.0
-rw-r--r--  1 root  bin   260322 Jul  2 05:58 /usr/local/lib/libboost_signals.a
......

So I should specify boost_system library during compilation:

$ c++ -L/usr/local/lib client.cpp -lboost_system
$

This time it works!

configure script may not check pthread correctly on OpenBSD

I have come into at least 2 times that one project was built well on Linux, while can’t find pthread related definitions on OpenBSD, like this:

......
../../runtime/cilk-internal.h:39:6: error: unknown type name 'pthread_mutex_t'
     pthread_mutex_t posix;
     ^
../../runtime/cilk-internal.h:211:6: error: unknown type name 'pthread_t'
     pthread_t *tid;
     ^
../../runtime/cilk-internal.h:216:6: error: unknown type name 'pthread_cond_t'
     pthread_cond_t  waiting_workers_cond;
     ^
../../runtime/cilk-internal.h:217:6: error: unknown type name 'pthread_cond_t'
     pthread_cond_t  wakeup_first_worker_cond;
     ^
../../runtime/cilk-internal.h:218:6: error: unknown type name 'pthread_cond_t'
     pthread_cond_t  wakeup_other_workers_cond;
     ^
../../runtime/cilk-internal.h:219:6: error: unknown type name 'pthread_mutex_t'
     pthread_mutex_t workers_mutex;
     ^
../../runtime/cilk-internal.h:220:6: error: unknown type name 'pthread_cond_t'
     pthread_cond_t  workers_done_cond;
......

The source code is as following:

......
#if HAVE_PTHREAD
#include <pthread.h>
#endif
......

While the generated config.h doesn’t define HAVE_PTHREAD macro:

/* Define if you have POSIX threads libraries and header files. */
/* #undef HAVE_PTHREAD */

But in fact, the OpenBSD has provided all support of pthread. So please be aware of this issue.

The anatomy of uptime&w commands on OpenBSD

On OpenBSD, uptime and w are actually the same program:

$ ls -lt /usr/bin/uptime /usr/bin/w
-r-xr-xr-x  2 root  bin  18136 May 30 12:53 /usr/bin/uptime
-r-xr-xr-x  2 root  bin  18136 May 30 12:53 /usr/bin/w

and the source code is usr.bin/w/w.c.

Compare the outputs of uptime and w:

$ uptime
10:59AM  up 7 days,  1:51, 1 user, load averages: 0.00, 0.00, 0.00
$ w
10:59AM  up 7 days,  1:51, 1 user, load averages: 0.00, 0.00, 0.00
USER    TTY FROM              LOGIN@  IDLE WHAT
root     p0 10.217.242.57     9:10AM     0 w

You can see the uptime just displays the first line of w, and w also shows the login users’ information.

w uses clock_gettime to get system up time:

if (clock_gettime(CLOCK_BOOTTIME, &boottime) != -1) {
    ......
} 

and getloadavg to retrieve system load average int the past 1, 5, and 15 minutes:

int
getloadavg(double loadavg[], int nelem)
{
    ......
    mib[0] = CTL_VM;
    mib[1] = VM_LOADAVG;
    size = sizeof(loadinfo);
    if (sysctl(mib, 2, &loadinfo, &size, NULL, 0) < 0)
        return (-1);
    ......
}

The current user login information is kept in /var/run/utmp, and it is composed of utmp struct:

struct utmp {
    char    ut_line[UT_LINESIZE];
    char    ut_name[UT_NAMESIZE];
    char    ut_host[UT_HOSTSIZE];
    time_t  ut_time;
};

utmp.ut_line is the login terminal (remove “tty” prefix); utmp.ut_name is the login user name; utmp.ut_host is the login address and the utmp.ut_timeis the login time. These are the first 4 columns of every line:

USER    TTY FROM              LOGIN@  IDLE WHAT
root     p0 10.217.242.57     9:10AM     0 w

The IDLE column displays how long has passed since you last operates on terminal:

if ((ep->idle = now - stp->st_atime) < 0)
        ep->idle = 0;

and WHAT shows the current process.