A performance issue caused by NUMA

The essence of NUMA is accessing local memory fast while remote slow, and I was bit by it today.

The original code is like this:

/* Every thread create one partition of a big vector and process it*/
#pragma omp parallel for
for (...)
{
    ......
    vector<> local_partition = create_big_vector_partition();
    /* processing the vector partition*/
    ......
}

I tried to create a big vector out of OpenMP block, then every thread just grabs a partition and processes it:

vector<> big_vector = create_big_vector();

#pragma omp parallel for
for (...)
{
    ......
    vector<>& local_partition = get_partition(big_vector);
    /* processing the vector partition*/
    ......
}

I measure the execution time of OpenMP block:

#pragma omp parallel for
for (...)
{
    ......
}

Though in original code, every thread needs to create partition of vector itself, it is still faster than the modified code.

After some experiments and analysis, numastat helps me to pinpoint the problem:

$ numastat
                           node0           node1
numa_hit              6259740856      7850720376
numa_miss              120468683       128900132
numa_foreign           128900132       120468683
interleave_hit             32881           32290
local_node            6259609322      7850520401
other_node             120600217       129100106

In original solution, every thread creates vector partition in local memory of CPU. However, in second case, the threads often need to access memory in remote node, and this overhead is bigger than creating vector partition locally.

What I learn from Practical Binary Analysis as a non-reverse-engineering engineer?

I spent the past two weeks in reading Practical Binary Analysis. Since I am not a professional reverse engineer, I glossed over the “Part III: Advanced Binary Analysis”, so I only read half the book. Even though, I still get a big gain:

(1) Know better of ELF file. On *nix Operating system, ELF file is everywhere: executable file, object file, shared library and coredump. “Chapter 2: The ELF format” gives me a clear explanation of the composition of ELF. E.g., I know why some functions have “@plt” suffix when using gdb to debug it.

(2) Master many tricks about GNU Binutils. GNU Binutils is a toolbox which provides versatile command line programs to analyze ELF files. Literally it relies heavily on BFD library. I also get some sense about how to use BFD library to tweak ELF files.

(3) “Appendix A: A crash course on X86 assembly” is a good tutorial for refreshing X86 architecture and assembly language.

(4) Others: E.g., I understand how to use LD_PRELOAD environmental variable and dynamic linking functions to manipulate shared library.

All in all, if you are working on *nix (although this book is based on Linux, I think most knowledge are also applicable to other *nix), you should try to read this book. I promise it is not a waste of time and you can always learn something, believe me!

The display format of Bash’s built-in time command

Check the default output format of bash‘s built-in time command:

# time

real    0m0.000s
user    0m0.000s
sys     0m0.000s

You can use -p option to output in POSIX format:

# time -p
real 0.00
user 0.00
sys 0.00

From bash source code, we know the the definitions of these two formats:

#define POSIX_TIMEFORMAT "real %2R\nuser %2U\nsys %2S"
#define BASH_TIMEFORMAT  "\nreal\t%3lR\nuser\t%3lU\nsys\t%3lS"

To decipher the meanings of them, we need to refer bash manual:

TIMEFORMAT

The value of this parameter is used as a format string specifying how the timing information for pipelines prefixed with the time reserved word should be displayed. The ‘%’ character introduces an escape sequence that is expanded to a time value or other information. The escape sequences and their meanings are as follows; the braces denote optional portions.

%%
A literal ‘%’.

%[p][l]R
The elapsed time in seconds.

%[p][l]U
The number of CPU seconds spent in user mode.

%[p][l]S
The number of CPU seconds spent in system mode.

%P
The CPU percentage, computed as (%U + %S) / %R.

The optional p is a digit specifying the precision, the number of fractional digits after a decimal point. A value of 0 causes no decimal point or fraction to be output. At most three places after the decimal point may be specified; values of p greater than 3 are changed to 3. If p is not specified, the value 3 is used.

The optional l specifies a longer format, including minutes, of the form MMmSS.FFs. The value of p determines whether or not the fraction is included.
……

Take POSIX_TIMEFORMAT as an example: %2R denotes using second as time unit, and the precision is two digits after a decimal point; %2U and %2S are similar.

Now you can comprehend the output of time, correct? Try using BASH_TIMEFORMAT as a practice.

 

Downgrade boost on Ubuntu 16.04

In the past 2 days, I was tortured by boost. The default boostversion on Ubuntu 16.04 is 1.58, but I met following compile errors:

......
/usr/include/boost/multi_index/detail/bucket_array.hpp: In static member function ‘static std::size_t boost::multi_index::detail::bucket_array_base<_>::size_index(std::size_t)’:
/usr/include/boost/multi_index/detail/bucket_array.hpp:84:62: error: invalid use of non-lvalue array
     const std::size_t *bound=std::lower_bound(sizes,sizes+sizes_length,n);
                                                              ^~~~~~~~~~~~
/usr/include/boost/multi_index/detail/bucket_array.hpp:85:25: error: invalid use of non-lvalue array
     if(bound==sizes+sizes_length)--bound;
                         ^~~~~~~~~~~~
/usr/include/boost/multi_index/detail/bucket_array.hpp:86:22: error: invalid use of non-lvalue array
     return bound-sizes;
......

I downgraded boost to 1.55; and downloaded and built it:

$ ./bootstrap.sh --prefix=/usr/local  
$ sudo ./b2 -a -q install

This time I found the default gcc-5 could not compile successfully. So I followed this post to install gcc-6, and modified/home/nan/boost_1_55_0/tools/build/v2/user-config.jam file to use gcc-6 to compile boost:

......
# Configure specific gcc version, giving alternative name to use.
using gcc : 6 : g++-6 ;
......

Then my project can be compiled successfully. Check gcc search header file path and library path:

$ echo | gcc-6 -E -Wp,-v -
ignoring nonexistent directory "/usr/local/include/x86_64-linux-gnu"
ignoring nonexistent directory "/usr/lib/gcc/x86_64-linux-gnu/6/../../../../x86_64-linux-gnu/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/6/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/6/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
......

$ ldconfig -v 2>/dev/null | grep -v ^$'\t'
/usr/local/cuda-9.0/targets/x86_64-linux/lib:
/usr/lib/x86_64-linux-gnu/libfakeroot:
/usr/local/lib:
/lib/x86_64-linux-gnu:
/usr/lib/x86_64-linux-gnu:
/usr/lib/nvidia-384:
/usr/lib32/nvidia-384:
/lib32:
/usr/lib32:
/lib:
/usr/lib:
/usr/lib/nvidia-384/tls: (hwcap: 0x8000000000000000)
/usr/lib32/nvidia-384/tls: (hwcap: 0x8000000000000000)

You will find the fresh installed boost 1.55 (in /usr/local directory) always be found before default boost 1.58 (header files are in /usr/include/boostand libraries in /usr/lib/x86_64-linux-gnu).

Update keyring first if your Arch Linux is old enough

My Arch Linux is not updated for nearly 3 months. When running pacman -Syu, it prompts following errors:

$ sudo pacman -Syu
......
error: python-dnspython: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python-dnspython-1.16.0-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python-distlib: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python-distlib-0.2.8-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python-pytoml: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python-pytoml-0.1.20-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-distlib: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python2-distlib-0.2.8-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-dnspython: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python2-dnspython-1.16.0-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: python2-pytoml: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/python2-pytoml-0.1.20-1-any.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: qbittorrent: signature from "Eli Schwartz <[email protected]>" is unknown trust
:: File /var/cache/pacman/pkg/qbittorrent-4.1.5-1-x86_64.pkg.tar.xz is corrupted (invalid or corrupted package (PGP signature)).
Do you want to delete it? [Y/n]
error: failed to commit transaction (invalid or corrupted package)
Errors occurred, no packages were upgraded.
......

The solution is updating archlinux-keyring first:

$ sudo pacman -S archlinux-keyring

Then all goes well!