Fix a weird “undefined symbol” issue

Today I met a weird “undefined symbol” issue, i.e., I built a program successfully on one machine, but after transferring it to another machine, it reported following error during running:

    ......
    ... symbol lookup error: xxxxxx: undefined symbol: LZ4F_compressFrameBound

from the output of ldd, all libraries are there. After some time of debugging, I found the reason is the build machine and running machine have different versions of the librdkafka library. After upgrading the librdkafka library from build machine to the same version as running machine’s, the issue is fixed.

The pitfall of upgrading 3rd-party library

Today, I debugged a tricky issue, a bug related to a 3rd-party library. When I used gdb to check a structure’s values, found the last member was missed compared to the definitions in header file. I began to suspect this might be caused by 3rd-party library. I checked the upgrade log, then found the root cause: when I compiled the code, the 3rd-party library’s version is v1.1, but when I run the program, the library was upgraded to v1.2 by others, which caused this mysterious bug. The solution is simple: rebuild the code. But the debugging process is exhausting.

Linking error of _ntl_gbigint_body in using NTL

I use NTL on ArchLinux, and there is a struct _ntl_gbigint_body which is actually not defined (refer this post):

/*
 * This way of defining the bigint handle type is a bit non-standard,
 * but better for debugging.
 */

struct _ntl_gbigint_body;
typedef _ntl_gbigint_body *_ntl_gbigint;  

You should pay attention to functions who depend on this struct, such as:

void _ntl_gcopy(_ntl_gbigint a, _ntl_gbigint *bb)   

Because for old NTL library, the function prototype generated by compiler is _ntl_gcopy(void*, void**):

$ readelf -sW libntl.so | c++filt | grep ntl_gcopy
2511: 000000000012b750   184 FUNC    GLOBAL DEFAULT   12 _ntl_gcopy(void*, void**)

While for new one it is _ntl_gcopy(_ntl_gbigint_body*, _ntl_gbigint_body**):

$ readelf -sW libntl.so | c++filt | grep ntl_gcopy
615: 0000000000148500   202 FUNC    GLOBAL DEFAULT   11 _ntl_gcopy(_ntl_gbigint_body*, _ntl_gbigint_body**)

So if you meet linking error as following:

undefined reference to `_ntl_gcopy(void*, void**)'

It should be NTL header files and library mismatch. The header files are old, while library is new.

Notice the linking library position on Ubuntu

This week, I ported tcpbench from OpenBSD to Linux. The idiomatic method of OpenBSD is putting the linking library in front of generating final target:

cc -g -O2 -Wall -levent -o tcpbench tcpbench.c

However this doesn’t work in Ubuntu since its the linker uses --as-needed option. So I change the Makefile to put the library at the end:

cc -g -O2 -Wall -o tcpbench tcpbench.c -levent

Please refer this discussion if you are interested.

Be careful of file sequence in linking process

Check following A.h:

# cat A.h
#pragma once

#include <iostream>
#include <vector>

class A
{
public:
        std::vector<int> v;
        A()
        {
                v.push_back(1);
                std::cout << "Enter A's constructor...\n";
        }
        int getFirstElem()
        {
                v.push_back(2);
                std::cout << "Enter A's getFirstElem...\n";
                return v[0];
        }
        ~A()
        {
                std::cout << "Enter A's destructor...\n";
        }
};

int func();

And A.cpp:

# cat A.cpp
#include "A.h"

static A a;

int func()
{
        return a.getFirstElem();
}

The A.cpp just define a A‘s static instance, and a func() returns first element in a‘s internal vector.

Check another file which utilizes A.h and A.cpp:

# cat hello.cpp
#include <iostream>
#include "A.h"

static int gP = func();

int main()
{
    std::cout << gP << std::endl;
    return 0;
}

Compile them:

# clang++ -c hello.cpp
# clang++ -c A.cpp

Link hello.o first and execute the program:

# clang++ hello.o A.o
# ./a.out
Enter A's getFirstElem...
Enter A's constructor...
2
Enter A's destructor...

Then link A.o first and execute the program:

# clang++ A.o hello.o
# ./a.out
Enter A's constructor...
Enter A's getFirstElem...
1
Enter A's destructor...

The results are different. In first case, when call a‘s getFirstElem() function, its constructor is not even called. Please pay attention to it!