Fix “error: file owned by …” issue on Arch Linux

I tried to install caffe on my Arch Linux, but it prompted following errors:

$ yaourt -S caffe
......
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/BasicThreadPool.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/FFT.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/FacVec.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/GF2.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/GF2E.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/GF2EX.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/GF2EXFactoring.h'
error: file owned by 'ntl' and 'ntl-threading': 'usr/include/NTL/GF2X.h'\
......

The reason is I installed NTL through pacman before. So the solution is uninstalling NTL first:

$ sudo pacman -R ntl

Then installing caffe successfully:

$ yaourt -S caffe
......
:: Running post-transaction hooks...
(1/1) Arming ConditionNeedsUpdate...
No database errors have been found!

Reference:
pacman a file owned by two packages.

Define multiple exclusive macros in CMake

My program needs FOO or BAR macro defined, but not both. By default, FOO is defined. Following is the snippet of CMakeList.txt:

OPTION(BAR "BAR" OFF)
OPTION(FOO "FOO" ON)
IF(BAR)
    SET(FOO OFF CACHE BOOL "Use BAR" FORCE)
    MESSAGE(STATUS "Use BAR\n")
    ADD_DEFINITIONS(-DBAR)
ELSE()
    MESSAGE(STATUS "Use FOO\n")
    ADD_DEFINITIONS(-DFOO)
ENDIF()

Notice: for simplicity, I just use the same name for macro and option variable. It may not be what you want.

Use boost on OpenBSD

Installing boost on OpenBSD is simple:

$ pkg_add boost

Write a simple test program:

#include <boost/asio.hpp>
#include <iostream>

int main()
{
    boost::system::error_code ec{};
    auto server_addr{boost::asio::ip::make_address("127.0.0.1", ec)};
    if (ec.value())
    {
        std::cerr << "Failed to parse the IP address. Error code = "
                    << ec.value() << ". Message: " << ec.message();
        return ec.value();
    }

    boost::asio::ip::tcp::endpoint ep{server_addr, 3003};
    return 0;
}

Compile it:

$ c++ client.cpp
/tmp/client-238877.o: In function `boost::asio::error::get_system_category()':
client.cpp:(.text._ZN5boost4asio5error19get_system_categoryEv[_ZN5boost4asio5error19get_system_categoryEv]+0x5): undefined reference to `boost::system::system_category()'
/tmp/client-238877.o: In function `boost::system::error_code::error_code()':
client.cpp:(.text._ZN5boost6system10error_codeC2Ev[_ZN5boost6system10error_codeC2Ev]+0x1b): undefined reference to `boost::system::system_category()'
/tmp/client-238877.o: In function `boost::system::error_category::std_category::equivalent(int, std::__1::error_condition const&) const':
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE[_ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE]+0x129): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE[_ZNK5boost6system14error_category12std_category10equivalentEiRKNSt3__115error_conditionE]+0x16a): undefined reference to `boost::system::generic_category()'
/tmp/client-238877.o: In function `boost::system::error_category::std_category::equivalent(std::__1::error_code const&, int) const':
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x137): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x178): undefined reference to `boost::system::generic_category()'
client.cpp:(.text._ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi[_ZNK5boost6system14error_category12std_category10equivalentERKNSt3__110error_codeEi]+0x2d2): undefined reference to `boost::system::generic_category()'
c++: error: linker command failed with exit code 1 (use -v to see invocation)

Whoops!, it means linker can’t find related library. boost libraries are installed in /usr/local/lib:

$ ls -lt /usr/local/lib/libboost*
-rw-r--r--  1 root  bin  2632774 Jul  2 05:58 /usr/local/lib/libboost_regex-mt.a
-rw-r--r--  1 root  bin  1398613 Jul  2 05:58 /usr/local/lib/libboost_regex-mt.so.8.0
-rw-r--r--  1 root  bin  2632774 Jul  2 05:58 /usr/local/lib/libboost_regex.a
-rw-r--r--  1 root  bin  1398613 Jul  2 05:58 /usr/local/lib/libboost_regex.so.8.0
-rw-r--r--  1 root  bin   994564 Jul  2 05:58 /usr/local/lib/libboost_serialization-mt.a
-rw-r--r--  1 root  bin   484918 Jul  2 05:58 /usr/local/lib/libboost_serialization-mt.so.8.0
-rw-r--r--  1 root  bin   994564 Jul  2 05:58 /usr/local/lib/libboost_serialization.a
-rw-r--r--  1 root  bin   484918 Jul  2 05:58 /usr/local/lib/libboost_serialization.so.8.0
-rw-r--r--  1 root  bin   260322 Jul  2 05:58 /usr/local/lib/libboost_signals-mt.a
-rw-r--r--  1 root  bin   154973 Jul  2 05:58 /usr/local/lib/libboost_signals-mt.so.8.0
-rw-r--r--  1 root  bin   260322 Jul  2 05:58 /usr/local/lib/libboost_signals.a
......

So I should specify boost_system library during compilation:

$ c++ -L/usr/local/lib client.cpp -lboost_system
$

This time it works!

An experiment about OpenMP parallel loop

From my testing, the OpenMP will launch threads number equals to “virtual CPU”, though it is not 100% guaranteed. Today I do a test about whether loop levels affect OpenMP performance.

Given the “virtual CPU” is 104 on my system, I define following constants and variables:

#define CPU_NUM (104)
#define LOOP_NUM (100)
#define ARRAY_SIZE (CPU_NUM * LOOP_NUM)

double a[ARRAY_SIZE], b[ARRAY_SIZE], c[ARRAY_SIZE];  

(1) Just one-level loop:

#pragma omp parallel
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        func(a, b, c, i);
    }

Execute 10 times consecutively:

$ cc -O2 -fopenmp parallel.c
$ ./a.out
Time consumed is 7.208773
Time consumed is 7.080540
Time consumed is 7.643123
Time consumed is 7.377163
Time consumed is 7.418053
Time consumed is 7.226235
Time consumed is 7.887611
Time consumed is 7.200167
Time consumed is 7.264515
Time consumed is 7.140937

(2) Use two-level loop:

for (int i = 0; i < LOOP_NUM; i++)
{
    #pragma omp parallel
        for (int j = 0; j < CPU_NUM; j++)
        {
            func(a, b, c, i * CPU_NUM + j);
        }
}

Execute 10 times consecutively:

$ cc -O2 -fopenmp parallel.c
$ ./a.out
Time consumed is 8.333529
Time consumed is 8.164226
Time consumed is 9.705631
Time consumed is 8.695201
Time consumed is 8.972555
Time consumed is 8.126084
Time consumed is 8.286818
Time consumed is 8.162565
Time consumed is 7.884917
Time consumed is 8.073982

At least from this test, one-level loop has a better performance. If you are interested, the source code is here.

CUDA P2P is not guaranteed to be faster than staged through the host

Today, I write a simple test to verify whether CUDA Peer-to-Peer Memory Copy is always faster than using CPU to transfer. At least from my platform, it is not:

(1) Disable P2P, you can see CPU utilization ratio is very high: 86.7%, and the bandwidth is nearly 10.67GB/s:

(2) Enable P2P, CPU utilization drops down to 1.3% only, and the bandwidth is about 1.6GB/s fall behind: 9.00GB/s:

P.S., the full code is here.