A “std::bad_alloc” issue caused by typo

Last week, I fixed a bug which was caused by a typo. The simplified code is like this:

#include <vector>
using namespace std;

class A {
    int i;
public:
    A(int i): i(i) {}
};

class B {
    vector<A> v;
public:
    B(vector<A> v1): v(v) {}
};

int main() {
    vector<A> a(1, 0);
    B b(a);
    return 0;
}

Please note the constructor of B:

B(vector<A> v1): v(v) {}

It was supposed to use v1 to initialize v, while I misspelled: v(v). My compiler is gcc 6.3.1, compile and run it:

# g++ -g test.cpp
# ./a.out
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

Change B(vector<A> v1): v(v) to B(vector<A> v1): v(v1), then all is OK.

What is the effect of extern “C”?

I often see the following code in C header files:

#ifdef __cplusplus
extern "C" {
#endif

......

#ifdef __cplusplus
}
#endif

What does it mean? Since there is __cplusplus marco, it must be related to C++ compilation. Let’s see a simple program (print.c):

$ cat print.c
#include <stdio.h>

void print(void)
{
        printf("Hello world!\n");
}

Use gcc to generate object file:

$ gcc -c print.c
$ 

Then create a main.cpp file which calls print() in its main() function:

$ cat main.cpp
extern void print(void);

int main(void)
{
        print();
        return 0;
}

Compile main.cpp and link with print.o:

$ g++ main.cpp print.o
/tmp/cc60fu19.o: In function `main':
main.cpp:(.text+0x5): undefined reference to `print()'
collect2: error: ld returned 1 exit status

It is weird, right? the print() function must be defined in print.o, why can’t g++ find it? Let’s do a simple magic, add "C" in extern void print(void);:

$ cat main.cpp
extern "C" void print(void);

int main(void)
{
        print();
        return 0;
}

Try compile main.cpp again:

$ g++ main.cpp print.o
$ ./a.out
Hello world!

It is OK now! The root cause is related to name mangling. To be simplified, when compile C++ code, the names of functions, global variables, etc will be modified, not the same as original format. While compile C code, this won’t happen. The function of extern "C" is to tell C++ compiler search the original name, not the mangled ones. To get a sense of name mangling, you can check the print() name in object file:

$ readelf -s print.o | grep print
 1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS print.c
 9: 0000000000000000    17 FUNC    GLOBAL DEFAULT    1 print

Then use g++ to compile print.c, and check function name again:

$ g++ -c print.c
$ readelf -s print.o | grep print
 1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS print.c
 9: 0000000000000000    17 FUNC    GLOBAL DEFAULT    1 _Z5printv

You can see the print() function name is actually _Z5printv when use g++ to generate the object file.

References:
Why do we use extern “C”?;
In C++ source, what is the effect of extern “C”?.

The differences between using gcc/g++ to compile *.c/*.cpp files

I bump into Compiling a C++ program with gcc today, and think it is a very interesting topic. So I do the following tests:

(1) Create a canonical C source file:

$ cat main.c
#include <stdio.h>

void hello(void)
{
        printf("Hello World!\n");
}

int main(void)
{
        hello();
}

Use gcc to compile it and search hello in symbol table of executable file:

$ gcc main.c
$ readelf -s a.out | grep hello
53: 00000000004004f6    17 FUNC    GLOBAL DEFAULT   13 hello

Then use g++ to compile it and also search hello in a.out:

$ g++ main.c
$ readelf -s a.out | grep hello
54: 0000000000400526    17 FUNC    GLOBAL DEFAULT   13 _Z5hellov

Since hello is name mangled to _Z5hellov when using g++, we can make sure that g++ will compile *.c file as C++.

(2) Modify main.c to a standard C++ file:

$ cat main.cpp
#include <iostream>

void hello(void)
{
        std::cout << "Hello, world!" << std::endl;
}

int main(void)
{
        hello();
}

Use gcc to compile it:

$ gcc main.cpp
/tmp/cccAb8IH.o: In function `hello()':
main.cpp:(.text+0xa): undefined reference to `std::cout'
main.cpp:(.text+0xf): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
main.cpp:(.text+0x14): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)'
main.cpp:(.text+0x1c): undefined reference to `std::ostream::operator<<(std::ostream& (*)(std::ostream&))'
/tmp/cccAb8IH.o: In function `__static_initialization_and_destruction_0(int, int)':
main.cpp:(.text+0x56): undefined reference to `std::ios_base::Init::Init()'
main.cpp:(.text+0x65): undefined reference to `std::ios_base::Init::~Init()'
collect2: error: ld returned 1 exit status

We can see there are many link errors. The reason is when gcc compiles *.cpp file, it will employ C++ compiler, but it only use defaults C and gcc helpers libraries during linkage stage, and won’t use stdc++ library. Modify the command, then all is OK:

$ gcc main.cpp -lstdc++
$ readelf -s a.out | grep hello
41: 00000000004007f7    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 0000000000400786    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

Certainly , since it is a regular C++ file, use g++ will undoubtedly be OK:

$ g++ main.cpp
$ readelf -s a.out | grep hello
41: 0000000000400817    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 00000000004007a6    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

(3) Modify the above file name from main.cpp to main.c.

$ cat main.c
#include <iostream>

void hello(void)
{
        std::cout << "Hello, world!" << std::endl;
}

int main(void)
{
        hello();
}

Use g++ to compile it:

$ g++ main.c
$ readelf -s a.out | grep hello
41: 0000000000400817    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 00000000004007a6    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

All is OK, and the stdc++ library will be used during linkage stage.

Use gcc instead:

$ gcc main.c
main.c:1:20: fatal error: iostream: No such file or directory

                    ^
compilation terminated.

This error identifies gcc always uses standard C compiler to *.c file, so it can’t find C++ related header files.

P.S. my gcc version:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 6.3.1 20170109 (GCC)

 

The tips of debugging Mesos

In the past week, I was following this tutorial to build a “kubernetes on Mesos” testbed. All went well but the Mesos master always complains following words:

......
E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected
......

Firstly, I tried to get help from Mesos mailing list and stackoverflow, but after other friends can’t give correct answers directly, I knew I must depend on myself. Enduring a tough debugging process, I worked out the root cause. Since I am a newbie of Mesos and C++(Mesos is implemented in C++, and I last time touch C++ was 7 years ago), I think the experiences and tips may also be useful for other novices. So I summarize them as the following words:

(1) LOG VS VLOG

When you meet an issue, analyzing log should be the first step. Mesos utilizes the google-glog to generate the logs. And the log format explanation is here:

Log lines have this form:
    Lmmdd hh:mm:ss.uuuuuu threadid file:line] msg...
where the fields are defined as follows:
    L                A single character, representing the log level (eg 'I' for INFO)
    mm               The month (zero padded; ie May is '05')
    dd               The day (zero padded)
    hh:mm:ss.uuuuuu  Time in hours, minutes and fractional seconds
    threadid         The space-padded thread ID as returned by GetTID()
    file             The file name
    line             The line number
    msg              The user-supplied message

So compared with the above words, you can easily understand this log:

E1228 21:57:13.138357 27257 process.cpp:1911] Failed to shutdown socket with fd 17: Transport endpoint is not connected

By default, the Mesos doesn’t output logs generated by VLOG function, and you need to set GLOG_v=m if you want to see the information from VLOG function (Refer this post):

$ sudo GLOG_v=3 ./bin/mesos-master.sh --ip=15.242.100.56 --work_dir=/var/lib/mesos
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1229 22:42:38.818521 11830 process.cpp:2426] Spawned process __gc__@15.242.100.56:5050
I1229 22:42:38.818613 11846 process.cpp:2436] Resuming __gc__@15.242.100.56:5050 at 2015-12-30 03:42:38.818540032+00:00
I1229 22:42:38.818749 11847 process.cpp:2436] Resuming __gc__@15.242.100.56:5050 at 2015-12-30 03:42:38.818712832+00:00
I1229 22:42:38.818802 11844 process.cpp:2436] Resuming help@15.242.100.56:5050 at 2015-12-30 03:42:38.818746112+00:00
......

You can also use LOG function to add logs yourself on suspected locations.

(2) Gdb

If logs can’t save you, it is time for debugger to be your hero. Gdb is no doubt a great tool of debugging C/C++ programs. To use gdb, you should enable --enable-debug configuration option before compiling Mesos:

nan@ubuntu:~/mesos-0.25.0/build$ ../configure --enable-debug

You can set breakpoint on class member function like this:

(gdb) b process::SocketManager::close(int)
Breakpoint 1 at 0x7fd07857c162: file ../../../3rdparty/libprocess/src/process.cpp, line 1849.

You can also make use of “auto-complete” feature of gdb. Input the uncompleted function name:

(gdb) b process::SocketManager::cl  

Then click tab:

(gdb) b process::SocketManager::close(int)
Breakpoint 1 at 0x7fd07857c162: file ../../../3rdparty/libprocess/src/process.cpp, line 1849.

Notice: If the matched symbols are too many, it may hang gdb. So try to reduce the scope as small as possible.

Additionally, since source file names are relative to the directory where the code was compiled (please refer breakpoints in GDB),you can reach the same effect through “b file:line“command :

(gdb) b ../../../3rdparty/libprocess/src/process.cpp:1279
Breakpoint 2 at 0x7fd07857973a: file ../../../3rdparty/libprocess/src/process.cpp, line 1279.
(gdb) c
Continuing.
......
[Switching to Thread 0x7fd06b9d4700 (LWP 16677)]

Breakpoint 2, process::SocketManager::link_connect (this=0xca1a30, future=..., socket=0x7fd0500026d0, to=...)
    at ../../../3rdparty/libprocess/src/process.cpp:1279
1279      if (future.isDiscarded() || future.isFailed()) {

You can see the breakpoint is set onprocess::SocketManager::link_connect(process::Future<Nothing> const&, process::network::Socket*, process::UPID const&)function.

P.S.:There are also handy out-of-box gdb scripts in build/bin directory:

# ls bin/gdb-mesos-*
bin/gdb-mesos-local.sh  bin/gdb-mesos-master.sh  bin/gdb-mesos-slave.sh  bin/gdb-mesos-tests.sh

(3) Tcpdump and wireshark

Network packet analyzing tools such as tcpdump and wireshark are essential to diagnose programs which interact with other hosts. E.g., you can use following command to see what come in and out of Mesos master:

sudo tcpdump -A -s 0 'tcp port 5050' -i em1 -w capture.pcap

BTW, my issue is finally fixed by analyzing the following packet:

1

(4) Pstack script

Personally, I think pstack script is useful when monitoring thread status, and please refer Use pstack to track threads on Linux for detail.

Enjoy debugging!

 

Install libstdc++ debug package on Ubuntu

When using pstack script to analyze the thread stacks of process on Ubuntu, there may be some unresolved symbols, like ??:

......
Thread 3 (Thread 0x7f71513f3700 (LWP 10049)):
#0  0x00007f716399cb13 in epoll_wait ()
#1  0x00007f71688569e5 in epoll_poll (
#2  0x00007f7168858c72 in ev_run (loop=0x7f71697e44e0 <default_loop_struct>,
#3  0x00007f716881cf16 in ev_loop ()
#4  0x00007f716881d680 in process::EventLoop::run ()
#5  0x00007f7168808141 in std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) () from /home/nan/mesos-0.26.0/build/src/.libs/libmesos-0.26.0.so
#6  0x00007f716880809b in std::_Bind_simple<void (*())()>::operator()() ()
#7  0x00007f7168808034 in std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() () from /home/nan/mesos-0.26.0/build/src/.libs/libmesos-0.26.0.so
#8  0x00007f7164234a40 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007f716448f182 in start_thread (arg=0x7f71513f3700)
#10 0x00007f716399c47d in clone ()
......

To resolve this issue, firstly, you need to know the libstdc++6 version:

$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.4-2ubuntu1~14.04' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)

Then search the debug files of the 4.8.4 version:

~$ sudo apt-cache search libstdc++ | grep 4.8
......
libstdc++6-4.8-dbg - GNU Standard C++ Library v3 (debugging files)
......

Install libstdc++6-4.8-dbg:

$ sudo apt-get install libstdc++6-4.8-dbg
Reading package lists... Done
Building dependency tree
Reading state information... Done
......

The symbols can be resolved now:

......
Thread 3 (Thread 0x7f71513f3700 (LWP 10049)):
#0  0x00007f716399cb13 in epoll_wait ()
#1  0x00007f71688569e5 in epoll_poll (
#2  0x00007f7168858c72 in ev_run (loop=0x7f71697e44e0 <default_loop_struct>,
#3  0x00007f716881cf16 in ev_loop ()
#4  0x00007f716881d680 in process::EventLoop::run ()
#5  0x00007f7168808141 in std::_Bind_simple<void (*())()>::_M_invoke<>(std::_Index_tuple<>) () from /home/nan/mesos-0.26.0/build/src/.libs/libmesos-0.26.0.so
#6  0x00007f716880809b in std::_Bind_simple<void (*())()>::operator()() ()
#7  0x00007f7168808034 in std::thread::_Impl<std::_Bind_simple<void (*())()> >::_M_run() () from /home/nan/mesos-0.26.0/build/src/.libs/libmesos-0.26.0.so
#8  0x00007f7164234a40 in std::(anonymous namespace)::execute_native_thread_routine (__p=<optimized out>)
#9  0x00007f716448f182 in start_thread (arg=0x7f71513f3700)
#10 0x00007f716399c47d in clone ()
......