The pitfall of upgrading 3rd-party library

Today, I debugged a tricky issue, a bug related to a 3rd-party library. When I used gdb to check a structure’s values, found the last member was missed compared to the definitions in header file. I began to suspect this might be caused by 3rd-party library. I checked the upgrade log, then found the root cause: when I compiled the code, the 3rd-party library’s version is v1.1, but when I run the program, the library was upgraded to v1.2 by others, which caused this mysterious bug. The solution is simple: rebuild the code. But the debugging process is exhausting.

Linking error of _ntl_gbigint_body in using NTL

I use NTL on ArchLinux, and there is a struct _ntl_gbigint_body which is actually not defined (refer this post):

/*
 * This way of defining the bigint handle type is a bit non-standard,
 * but better for debugging.
 */

struct _ntl_gbigint_body;
typedef _ntl_gbigint_body *_ntl_gbigint;  

You should pay attention to functions who depend on this struct, such as:

void _ntl_gcopy(_ntl_gbigint a, _ntl_gbigint *bb)   

Because for old NTL library, the function prototype generated by compiler is _ntl_gcopy(void*, void**):

$ readelf -sW libntl.so | c++filt | grep ntl_gcopy
2511: 000000000012b750   184 FUNC    GLOBAL DEFAULT   12 _ntl_gcopy(void*, void**)

While for new one it is _ntl_gcopy(_ntl_gbigint_body*, _ntl_gbigint_body**):

$ readelf -sW libntl.so | c++filt | grep ntl_gcopy
615: 0000000000148500   202 FUNC    GLOBAL DEFAULT   11 _ntl_gcopy(_ntl_gbigint_body*, _ntl_gbigint_body**)

So if you meet linking error as following:

undefined reference to `_ntl_gcopy(void*, void**)'

It should be NTL header files and library mismatch. The header files are old, while library is new.

Notice the linking library position on Ubuntu

This week, I ported tcpbench from OpenBSD to Linux. The idiomatic method of OpenBSD is putting the linking library in front of generating final target:

cc -g -O2 -Wall -levent -o tcpbench tcpbench.c

However this doesn’t work in Ubuntu since its the linker uses --as-needed option. So I change the Makefile to put the library at the end:

cc -g -O2 -Wall -o tcpbench tcpbench.c -levent

Please refer this discussion if you are interested.

Be careful of file sequence in linking process

Check following A.h:

# cat A.h
#pragma once

#include <iostream>
#include <vector>

class A
{
public:
        std::vector<int> v;
        A()
        {
                v.push_back(1);
                std::cout << "Enter A's constructor...\n";
        }
        int getFirstElem()
        {
                v.push_back(2);
                std::cout << "Enter A's getFirstElem...\n";
                return v[0];
        }
        ~A()
        {
                std::cout << "Enter A's destructor...\n";
        }
};

int func();

And A.cpp:

# cat A.cpp
#include "A.h"

static A a;

int func()
{
        return a.getFirstElem();
}

The A.cpp just define a A‘s static instance, and a func() returns first element in a‘s internal vector.

Check another file which utilizes A.h and A.cpp:

# cat hello.cpp
#include <iostream>
#include "A.h"

static int gP = func();

int main()
{
    std::cout << gP << std::endl;
    return 0;
}

Compile them:

# clang++ -c hello.cpp
# clang++ -c A.cpp

Link hello.o first and execute the program:

# clang++ hello.o A.o
# ./a.out
Enter A's getFirstElem...
Enter A's constructor...
2
Enter A's destructor...

Then link A.o first and execute the program:

# clang++ A.o hello.o
# ./a.out
Enter A's constructor...
Enter A's getFirstElem...
1
Enter A's destructor...

The results are different. In first case, when call a‘s getFirstElem() function, its constructor is not even called. Please pay attention to it!