The difference of loopback packets on Linux and OpenBSD

Capture the packets on loopback network card on Linux:

# tcpdump -i lo -w lo.pcap port 33333
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
......

Download it onto Windows and use wireshark to analyze it:

1

We can see every packet conforms to standard ethernet format.

Capture lookback packets on OpenBSD:

# tcpdump -i lo0 -w lo.pcap port 33333
tcpdump: listening on lo0, link-type LOOP
......

Also download it onto Windows and open it with wireshark:

2

The wireshark just recognizes the packet as “Raw IP” format, but can’t show details.

After referring discussion in Wireshark mailing list, I know it is related to network link-layer header type0x0C stands for “Raw IP”:

3

I modified the 0x0C to 0x6C, which means “OpenBSD loopback”:

4

Now the packets can be decoded successfully:

5

P.S., I also started a discussion about this issue in mailing list.

Update: I write a script to do this conversion.

Learn socket programming tips from netcat

Since netcat is honored as “TCP/IP swiss army knife”, I read its source code in OpenBSD to summarize some socket programming tips:

(1) Client connects in non-blocking mode:

......
s = socket(res->ai_family, res->ai_socktype |
            SOCK_NONBLOCK, res->ai_protocol);


......  
if ((ret = connect(s, name, namelen)) != 0 && errno == EINPROGRESS) {
        pfd.fd = s;
        pfd.events = POLLOUT;
        ret = poll(&pfd, 1, timeout));
}
......

Creating socket and set SOCK_NONBLOCK mode for it. Then calling connect() function, if ret is 0, it means connection is established successfully; if errno is EINPROGRESS, we can use timeout to control how long to wait; otherwise the connection can’t be built.

(2) The usage of poll():

......
/* stdin */
pfd[POLL_STDIN].fd = stdin_fd;
pfd[POLL_STDIN].events = POLLIN;

/* network out */
pfd[POLL_NETOUT].fd = net_fd;
pfd[POLL_NETOUT].events = 0;

/* network in */
pfd[POLL_NETIN].fd = net_fd;
pfd[POLL_NETIN].events = POLLIN;

/* stdout */
pfd[POLL_STDOUT].fd = stdout_fd;
pfd[POLL_STDOUT].events = 0;

......
/* poll */
num_fds = poll(pfd, 4, timeout);

/* treat poll errors */
if (num_fds == -1)
    err(1, "polling error");

/* timeout happened */
if (num_fds == 0)
    return;

/* treat socket error conditions */
for (n = 0; n < 4; n++) {
    if (pfd[n].revents & (POLLERR|POLLNVAL)) {
        pfd[n].fd = -1;
    }
}
/* reading is possible after HUP */
if (pfd[POLL_STDIN].events & POLLIN &&
    pfd[POLL_STDIN].revents & POLLHUP &&
    !(pfd[POLL_STDIN].revents & POLLIN))
    pfd[POLL_STDIN].fd = -1;

Usually, we just need to care about file descriptors for reading:

pfd[POLL_STDIN].fd = stdin_fd;
pfd[POLL_STDIN].events = POLLIN;

no need to monitor file descriptors for writing:

/* network out */
pfd[POLL_NETOUT].fd = net_fd;
pfd[POLL_NETOUT].events = 0;

According to poll() manual from OpenBSD, if no need for “high-priority” (maybe out-of-band) data, POLLIN is enough, otherwise the monitor events should be POLLIN|POLLPRI. And this is similar for POLLOUT and POLLWRBAND.

There are 3 values(POLLERR, POLLNVAL and POLLHUP) which are only used in struct pollfd‘s revents. If POLLERR or POLLNVAL is detected, it’s not necessary to poll this file descriptor furthermore:

if (pfd[n].revents & (POLLERR|POLLNVAL)) {
    pfd[n].fd = -1;
}

We should pay more attention to POLLHUP:
(a)

POLLHUP

The device or socket has been disconnected. This event and POLLOUT are mutually-exclusive; a descriptor can never be writable if a hangup has occurred. However, this event and POLLIN, POLLRDNORM, POLLRDBAND, or POLLPRI are not mutually-exclusive. This flag is only valid in the revents bitmask; it is ignored in the events member.

(b)

The second difference is that on EOF there is no guarantee that POLLIN will be set in revents, the caller must also check for POLLHUP.

So it means if POLLHUP and POLLIN are both set in revents, there must be data to read (maybe EOF?), otherwise if only POLLHUP is checked, there is no data to read from.

 

OpenBSD gives a hint on forgetting unlock mutex

Check following simple C++ program:

#include <mutex>

int main(void)
{
    std::mutex m;
    m.lock();

    return 0;
}

The mutex m forgot unlock itself before exiting main function:

m.unlock();

Test it on GNU/Linux, and I chose ArchLinux as the testbed:

$ uname -a
Linux fujitsu-i 4.13.7-1-ARCH #1 SMP PREEMPT Sat Oct 14 20:13:26 CEST 2017 x86_64 GNU/Linux
$ clang++ -g -pthread -std=c++11 test_mutex.cpp
$ ./a.out
$

The process exited normally, and no more words was given. Build and run it on OpenBSD 6.2:

# clang++ -g -pthread -std=c++11 test_mutex.cpp
# ./a.out
pthread_mutex_destroy on mutex with waiters!

The OpenBSD prompts “pthread_mutex_destroy on mutex with waiters!“. Interesting!

The first successful build of OpenBSD base system

Although the document of building OpenBSD base system is very simple, it still costs me nearly half a week to complete this work. I shared the experience here and hope the lessons I learned can help more newbies like me:

(1) The most important thing is about /usr/obj folder: it must be clean; belongs to wobj and mode is 770. I confronted the annoyed and weird “Permission denied” issue and have recorded the incident in this post before.

(2) I used to think once /usr/obj is calibrated, the compilation will go smoothly. Unfortunately, a more bizarre linking issue harassed me again:

cc -O2 -pipe  -DPIE_DEFAULT=1         -o gdb gdb.o libgdb.a ../bfd/libbfd.a -lreadline ../opcodes/libopcodes.a  -liberty -lncurses -lm     -liberty  -lkvm
libgdb.a(main.o): In function `captured_main':
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x7e6): warning:
warning: strcpy() is almost always misused, please use strlcpy()
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x7ff): warning:
warning: strcat() is almost always misused, please use strlcat()
libgdb.a(target.o): In function `normal_pid_to_str':
/usr/src/gnu/usr.bin/binutils/gdb/target.c:(.text+0x3047): warning:
warning: sprintf() is often misused, please use snprintf()
libgdb.a(main.o): In function `captured_main':
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0xa4): undefined reference to `bindtextdomain'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0xac): undefined reference to `textdomain'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x4b7): undefined reference to `dcgettext'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x606): undefined reference to `dcgettext'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x78d): undefined reference to `dcgettext'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0x9b9): undefined reference to `dcgettext'
/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0xa60): undefined reference to `dcgettext'
libgdb.a(main.o):/usr/src/gnu/usr.bin/binutils/gdb/main.c:(.text+0xb46): more undefined references to `dcgettext' follow
cc: error: linker command failed with exit code 1 (use -v to see invocation)
*** Error 1 in gnu/usr.bin/binutils/obj/gdb (Makefile:1176 'gdb')
*** Error 1 in gnu/usr.bin/binutils/obj (Makefile:21479 'all-gdb')
*** Error 1 in gnu/usr.bin/binutils (Makefile.bsd-wrapper:46 'all')
*** Error 1 in gnu/usr.bin (<bsd.subdir.mk>:48 'all')
*** Error 1 in gnu (<bsd.subdir.mk>:48 'all')
*** Error 1 in . (<bsd.subdir.mk>:48 'all')
*** Error 1 in . (Makefile:95 'do-build')
*** Error 1 in /usr/src (Makefile:74 'build')

I totally get lost in this blocking issue. After sending helps in both mailing list and Facebook, though many veterans gave good comments, after all they can’t approach my server, I still need to fix this issue myself at the end.

I began to doubt the upgrade of v6.2 because this is also the first time I upgrade OpenBSD. Although I couldn’t explain why the upgrade can be related to this link error, I don’t have other methods So I tried to follow the manual to do upgrade again. Now that I had done the OS upgrade, I also tried to refresh -current source code:

# cvs -q up -Pd
? mcount.d
? mcount.po
? gnu/usr.bin/binutils/gdb/.gdbinit
? gnu/usr.bin/binutils/gdb/Makefile
? gnu/usr.bin/binutils/gdb/config.cache
? gnu/usr.bin/binutils/gdb/config.h
? gnu/usr.bin/binutils/gdb/config.log
? gnu/usr.bin/binutils/gdb/config.status
? gnu/usr.bin/binutils/gdb/stamp-h
? gnu/usr.bin/binutils/gdb/doc/Makefile
? gnu/usr.bin/binutils/gdb/doc/config.log
? gnu/usr.bin/binutils/gdb/doc/config.status
? gnu/usr.bin/binutils/gdb/testsuite/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/config.log
? gnu/usr.bin/binutils/gdb/testsuite/config.status
? gnu/usr.bin/binutils/gdb/testsuite/gdb.ada/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.ada/gnat_ada.gpr
? gnu/usr.bin/binutils/gdb/testsuite/gdb.arch/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.asm/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.base/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.cp/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.disasm/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.dwarf2/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.fortran/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.java/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.mi/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.objc/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.stabs/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.stabs/config.log
? gnu/usr.bin/binutils/gdb/testsuite/gdb.stabs/config.status
? gnu/usr.bin/binutils/gdb/testsuite/gdb.threads/Makefile
? gnu/usr.bin/binutils/gdb/testsuite/gdb.trace/Makefile
...... 

Man! Why there are files began with ?? I see, I did some experiments in /usr/src directory before, and I didn’t notice that the source code had been polluted. That may be the root cause! As expected, after cleaning up the /usr/src, the build succeed finally!

It is time to wrap up, and the takeaways of building OpenBSD base system are as following:
(1) Clean up the /usr/obj and check its attributes carefully!
(2) Keep /usr/src synchronized with upstream and make sure it’s not messed up!

Clean out /usr/obj directory before compiling base system of OpenBSD

I tried to build base system of OpenBSD:

# cd /usr/src 
# make obj && make build

After a while, the following weird error was reported:

cc -O2 -pipe -g -Wimplicit -I/usr/src/lib/libc/include -I/usr/src/lib/libc/hidden -D__LIBC__  -Werror-implicit-function-declaration -include namespace.h -Werror=deprecated-declarations -DAPIWARN -DYP -I/usr/src/lib/libc/yp -I/usr/src/lib/libc -I/usr/src/lib/libc/gdtoa -I/usr/src/lib/libc/arch/amd64/gdtoa -DINFNAN_CHECK -DMULTIPLE_THREADS -DNO_FENV_H -DUSE_LOCALE -I/usr/src/lib/libc -I/usr/src/lib/libc/citrus -DRESOLVSORT -DFLOATING_POINT -DPRINTF_WIDE_CHAR -DSCANF_WIDE_CHAR -DFUTEX  -MD -MP  -c -fno-pie /usr/src/lib/libc/gmon/mcount.c -o mcount.po
error: error opening 'mcount.po.d': Permission denied
1 error generated.
*** Error 1 in lib/libc (gmon/Makefile.inc:12 'mcount.po': @cc -O2 -pipe -g -Wimplicit -I/usr/src/lib/libc/include -I/usr/src/lib/libc/hidde...)
*** Error 1 in lib (<bsd.subdir.mk>:48 'all')
*** Error 1 in . (Makefile:90 'do-build')
*** Error 1 in /usr/src (Makefile:74 'build')

I worked as a root, why don’t I have permission?

After checking manual again:

The build process will place the object files in a tree under /usr/obj. This directory must be owned by build:wobj with mode 770.

The first time, the /usr/obj directory must be cleaned out completely before proceeding to avoid permission issues. After a successful release build, this is no longer needed.

I checked my /usr/obj immediately:

# ls -alt
......
drwxrwx---  15 build  wobj    512 Sep  6 09:44 obj
......

The group and mode is correct, wait, today is October, 14th, but the date of directory is September, 6th. That’s should be the issue.

After cleaning out the /usr/obj and reissuing “make obj && make build“, the above error disappeared.