What is the effect of extern “C”?

I often see the following code in C header files:

#ifdef __cplusplus
extern "C" {
#endif

......

#ifdef __cplusplus
}
#endif

What does it mean? Since there is __cplusplus marco, it must be related to C++ compilation. Let’s see a simple program (print.c):

$ cat print.c
#include <stdio.h>

void print(void)
{
        printf("Hello world!\n");
}

Use gcc to generate object file:

$ gcc -c print.c
$ 

Then create a main.cpp file which calls print() in its main() function:

$ cat main.cpp
extern void print(void);

int main(void)
{
        print();
        return 0;
}

Compile main.cpp and link with print.o:

$ g++ main.cpp print.o
/tmp/cc60fu19.o: In function `main':
main.cpp:(.text+0x5): undefined reference to `print()'
collect2: error: ld returned 1 exit status

It is weird, right? the print() function must be defined in print.o, why can’t g++ find it? Let’s do a simple magic, add "C" in extern void print(void);:

$ cat main.cpp
extern "C" void print(void);

int main(void)
{
        print();
        return 0;
}

Try compile main.cpp again:

$ g++ main.cpp print.o
$ ./a.out
Hello world!

It is OK now! The root cause is related to name mangling. To be simplified, when compile C++ code, the names of functions, global variables, etc will be modified, not the same as original format. While compile C code, this won’t happen. The function of extern "C" is to tell C++ compiler search the original name, not the mangled ones. To get a sense of name mangling, you can check the print() name in object file:

$ readelf -s print.o | grep print
 1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS print.c
 9: 0000000000000000    17 FUNC    GLOBAL DEFAULT    1 print

Then use g++ to compile print.c, and check function name again:

$ g++ -c print.c
$ readelf -s print.o | grep print
 1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS print.c
 9: 0000000000000000    17 FUNC    GLOBAL DEFAULT    1 _Z5printv

You can see the print() function name is actually _Z5printv when use g++ to generate the object file.

References:
Why do we use extern “C”?;
In C++ source, what is the effect of extern “C”?.

The differences between using gcc/g++ to compile *.c/*.cpp files

I bump into Compiling a C++ program with gcc today, and think it is a very interesting topic. So I do the following tests:

(1) Create a canonical C source file:

$ cat main.c
#include <stdio.h>

void hello(void)
{
        printf("Hello World!\n");
}

int main(void)
{
        hello();
}

Use gcc to compile it and search hello in symbol table of executable file:

$ gcc main.c
$ readelf -s a.out | grep hello
53: 00000000004004f6    17 FUNC    GLOBAL DEFAULT   13 hello

Then use g++ to compile it and also search hello in a.out:

$ g++ main.c
$ readelf -s a.out | grep hello
54: 0000000000400526    17 FUNC    GLOBAL DEFAULT   13 _Z5hellov

Since hello is name mangled to _Z5hellov when using g++, we can make sure that g++ will compile *.c file as C++.

(2) Modify main.c to a standard C++ file:

$ cat main.cpp
#include <iostream>

void hello(void)
{
        std::cout << "Hello, world!" << std::endl;
}

int main(void)
{
        hello();
}

Use gcc to compile it:

$ gcc main.cpp
/tmp/cccAb8IH.o: In function `hello()':
main.cpp:(.text+0xa): undefined reference to `std::cout'
main.cpp:(.text+0xf): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)'
main.cpp:(.text+0x14): undefined reference to `std::basic_ostream<char, std::char_traits<char> >& std::endl<char, std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&)'
main.cpp:(.text+0x1c): undefined reference to `std::ostream::operator<<(std::ostream& (*)(std::ostream&))'
/tmp/cccAb8IH.o: In function `__static_initialization_and_destruction_0(int, int)':
main.cpp:(.text+0x56): undefined reference to `std::ios_base::Init::Init()'
main.cpp:(.text+0x65): undefined reference to `std::ios_base::Init::~Init()'
collect2: error: ld returned 1 exit status

We can see there are many link errors. The reason is when gcc compiles *.cpp file, it will employ C++ compiler, but it only use defaults C and gcc helpers libraries during linkage stage, and won’t use stdc++ library. Modify the command, then all is OK:

$ gcc main.cpp -lstdc++
$ readelf -s a.out | grep hello
41: 00000000004007f7    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 0000000000400786    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

Certainly , since it is a regular C++ file, use g++ will undoubtedly be OK:

$ g++ main.cpp
$ readelf -s a.out | grep hello
41: 0000000000400817    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 00000000004007a6    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

(3) Modify the above file name from main.cpp to main.c.

$ cat main.c
#include <iostream>

void hello(void)
{
        std::cout << "Hello, world!" << std::endl;
}

int main(void)
{
        hello();
}

Use g++ to compile it:

$ g++ main.c
$ readelf -s a.out | grep hello
41: 0000000000400817    21 FUNC    LOCAL  DEFAULT   13 _GLOBAL__sub_I__Z5hellov
59: 00000000004007a6    35 FUNC    GLOBAL DEFAULT   13 _Z5hellov

All is OK, and the stdc++ library will be used during linkage stage.

Use gcc instead:

$ gcc main.c
main.c:1:20: fatal error: iostream: No such file or directory

                    ^
compilation terminated.

This error identifies gcc always uses standard C compiler to *.c file, so it can’t find C++ related header files.

P.S. my gcc version:

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 6.3.1 20170109 (GCC)

 

An example of debugging parallel program

In the past 2 weeks, I was tortured by a nasty bug, yes, a not-always-reproducible, use-third-party-code, parallel-program issue. Thank Goodness! The root cause was found on this Thursday, and I think the whole progress is a perfect experience and tutorial of how to debug parallel program. So I record it here and hope it can give you some tips when facing similar situation.

Background: I am leveraging FHEW to implement some complex calculations and experiments. Now the FHEW only works in single-thread environment, since many functions share the same global variables, like this:

......
double *in;
fftw_complex *out;
fftw_plan plan_fft_forw, plan_fft_back;

void FFTsetup() {
  in = (double*) fftw_malloc(sizeof(double) * 2*N);
  out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * (N + 2));
  plan_fft_forw = fftw_plan_dft_r2c_1d(2*N, in, out,  FFTW_PATIENT);
  plan_fft_back = fftw_plan_dft_c2r_1d(2*N, out, in,  FFTW_PATIENT);
}

void FFTforward(Ring_FFT res, const Ring_ModQ val) {
  for (int k = 0; k < N; ++k)   {
    in[k] = (double) (val[k]);
    in[k+N] = 0.0;          
  }
  fftw_execute(plan_fft_forw); 
  for (int k = 0; k < N2; ++k) 
    res[k] = (double complex) out[2*k+1];               
}
.....

So to make it can be used in parallel, I first need to modify it to be used in multi-thread program. This work involves of allocating resources of every thread, adding lock of necessary global variables, etc. The multi-thread FHEW works always OK until I employed it to implement a complicated function. At most times the result was correct except some rare occasions. So I began the tough debugging work.

(1) Log. Undoubtedly, the log is the most effective one in all debugging tools. I tried to add as many traces as I can. An important attention you must pay is now that multiple threads are running simultaneously, you must identify the logs are from which thread. For example, if you use OpenMP, you can use following code to differentiate thread IDs:

......
printf("Thread(%d/%d) ......", omp_get_thread_num(), omp_get_num_threads(), ......);
......

After analyzing the log, a function was spotted, but it did noting but use 2 consecutive HomNAND to implement the AND operations (You should consider HomNAND is just a NAND gate in logical circuit, and use two NAND gates can realize AND gate.). According to AND gate truth-table, when two inputs are 0, the output should be 0 too, but in my program, the result is 1, which causes the final error.

(2) Simplify the program and try to find the reproducible condition. Since the bug was related to HomNAND and my original program was a little complicated, I needed to do some simple tests and try to find the reproducible condition. I designed 3 experiments:

a) Use original single-thread FHEW, and execute one HomNAND continuously:

for (;;) {
    ......
    HomNAND();
    ......
}

b) Use modified multi-thread FHEW, and spawn 8 threads. Every thread run one HomNAND in dead-loop:

#pragma omp parallel for
for (int loop = 0; loop < 8; loop++) {
    ......
    HomNAND();
    ......
}

c) Use modified multi-thread FHEW, and spawn 8 threads. Every thread run two HomNANDs to achieve AND function in dead-loop:

#pragma omp parallel for
for (int loop = 0; loop < 8; loop++) {
    ......
    HomNAND()
    HomNAND();
    ......
}

After testing, I found case a) and b) were always OK, while c) would generate wrong results after running 1 or 2 hours.

(3) Now the thing became intricate. Because I modified the FHEW, and FHEW also used FFTW, it was not an easy task to locate where the problem was. Additionally, the thread-safe usage of FFTW also involved many discussions (like this) and pitfalls (like this), so I resort the authors of FFTW to make sure my code is not wrong.

(4) Back to the experiments, I suddenly found that I have tested single-thread-single-HomNAND, multiple-thread-single-HomNAND, multiple-thread-two-HomNAND cases, but how about single-thread-two-HomNAND? If single-thread-two-HomNAND case can fail, it can prove the my modification code was innocent, and the problem should reside on the original FHEW code:

for (;;) {
    ......
    HomNAND();
    HomNAND();
    ......
}

Yeah, I was very lucky! After one hour, the error occurred. The murder was found, so the next step was to seek solution with the author.

Based on this experience, I think analyzing log and simplifying test model are very important techniques during resolving parallel program bugs. Hope everyone can improve debugging ability, happy debugging!

Build FFTW supporting multi-thread

According to FFTW document:

Starting from FFTW-3.3.5, FFTW supports a new API to make the planner thread-safe:

void fftwmakeplannerthreadsafe(void);

This call installs a hook that wraps a lock around all planner calls.

To use fftw_make_planner_thread_safe() API, you need to add --enable-threads in configuration stage:

$ ./configure --enable-threads
$ make  
$ sudo make install

There will be two additional libraries (libfftw3_threads.a and libfftw3_threads.la) generated compared with no --enable-threads option:

$ cd /usr/local/lib/
$ ls -alt libfftw*
-rw-r--r-- 1 root root   44528 Jan 16 09:18 libfftw3_threads.a
-rwxr-xr-x 1 root root     946 Jan 16 09:18 libfftw3_threads.la
-rw-r--r-- 1 root root 2222828 Jan 16 09:18 libfftw3.a
-rwxr-xr-x 1 root root     886 Jan 16 09:18 libfftw3.la

BTW, although fftw_make_planner_thread_safe() API was introduced in FFTW-3.3.5, it didn’t really work until FFTW-3.3.6 who contains an important patch. Please refer the discussion and release note.

Generate carry in Carry-lookahead adder

Recently, I am implementing an 8-bit Carry-lookahead adder. But unfortunately, almost all materials in the Internet only provide the formula to get C1 toC4, so I write a simple python script to generate all carries:

#!/usr/bin/python3

import sys
num = 8

if len(sys.argv) != 1:
    num = int(sys.argv[1])

dict = {'C0': 'C0'}
for i in range(1, num + 1):
    dict['C' + str(i)] = 'G' + str(i - 1) + \
                        '+' + \
                        '+'.join([x + "*P" + str(i - 1) for x in dict['C' + str(i - 1)].split('+')])

for i in range(0, num + 1):
    key = 'C' + str(i)
    print(key, "=", dict[key])

By default, it will generate C1 to C8:

$ ./CarryFormulaInLookAheadAdder.py
C0 = C0
C1 = G0+C0*P0
C2 = G1+G0*P1+C0*P0*P1
C3 = G2+G1*P2+G0*P1*P2+C0*P0*P1*P2
C4 = G3+G2*P3+G1*P2*P3+G0*P1*P2*P3+C0*P0*P1*P2*P3
C5 = G4+G3*P4+G2*P3*P4+G1*P2*P3*P4+G0*P1*P2*P3*P4+C0*P0*P1*P2*P3*P4
C6 = G5+G4*P5+G3*P4*P5+G2*P3*P4*P5+G1*P2*P3*P4*P5+G0*P1*P2*P3*P4*P5+C0*P0*P1*P2*P3*P4*P5
C7 = G6+G5*P6+G4*P5*P6+G3*P4*P5*P6+G2*P3*P4*P5*P6+G1*P2*P3*P4*P5*P6+G0*P1*P2*P3*P4*P5*P6+C0*P0*P1*P2*P3*P4*P5*P6
C8 = G7+G6*P7+G5*P6*P7+G4*P5*P6*P7+G3*P4*P5*P6*P7+G2*P3*P4*P5*P6*P7+G1*P2*P3*P4*P5*P6*P7+G0*P1*P2*P3*P4*P5*P6*P7+C0*P0*P1*P2*P3*P4*P5*P6*P7

But you can pass a parameter to tell how many carries you want:

$ ./CarryFormulaInLookAheadAdder.py 6
C0 = C0
C1 = G0+C0*P0
C2 = G1+G0*P1+C0*P0*P1
C3 = G2+G1*P2+G0*P1*P2+C0*P0*P1*P2
C4 = G3+G2*P3+G1*P2*P3+G0*P1*P2*P3+C0*P0*P1*P2*P3
C5 = G4+G3*P4+G2*P3*P4+G1*P2*P3*P4+G0*P1*P2*P3*P4+C0*P0*P1*P2*P3*P4
C6 = G5+G4*P5+G3*P4*P5+G2*P3*P4*P5+G1*P2*P3*P4*P5+G0*P1*P2*P3*P4*P5+C0*P0*P1*P2*P3*P4*P5

P.S., the full code is here.