The anatomy of “Hello World” python program in bcc

The bcc gives a simple Hello World program analysis, but I want to give a more detailed anatomy of it:

from bcc import BPF

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()

(1)

from bcc import BPF  

bcc is a package, and it will be installed in Python‘s site-packages directory. Take my server as an example:

>>> import bcc
>>> bcc.__path__
['/usr/lib/python3.6/site-packages/bcc']

BPF is a class defined in __init__.py file of bcc:

......
class BPF(object):
# From bpf_prog_type in uapi/linux/bpf.h
    SOCKET_FILTER = 1
    KPROBE = 2
......

(2) Divide

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()

into 2 parts:

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }')

and

trace_print()

a)

BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }')

This statement will create a BPF object, and the code is simplified as following:

......
from .libbcc import lib, _CB_TYPE, bcc_symbol, _SYM_CB_TYPE 
......
class BPF(object):
......
    def __init__(self, src_file="", hdr_file="", text=None, cb=None, debug=0,
        cflags=[], usdt_contexts=[]):
        ......

        if text:
            self.module = lib.bpf_module_create_c_from_string(text.encode("ascii"),
            self.debug, cflags_array, len(cflags_array))
        else:
            ......

        self._trace_autoload()

Check libbcc.py:

lib = ct.CDLL("libbcc.so.0", use_errno=True)

We can see lib module is indeed libbcc dynamic library, and bpf_module_create_c_from_string is implemented as this:

void * bpf_module_create_c_from_string(const char *text, unsigned flags, const char *cflags[], int ncflags) {
    auto mod = new ebpf::BPFModule(flags);
    if (mod->load_string(text, cflags, ncflags) != 0) {
        delete mod;
        return nullptr;
    }
    return mod;
}

We won’t delve it too much, and just know it returns a ebpf module is enough.

For self._trace_autoload() part, we can check _trace_autoload method definition:

def _trace_autoload(self):
    for i in range(0, lib.bpf_num_functions(self.module)):
        func_name = str(lib.bpf_function_name(self.module, i).decode())
        if func_name.startswith("kprobe__"):
            fn = self.load_func(func_name, BPF.KPROBE)
            self.attach_kprobe(event=fn.name[8:], fn_name=fn.name)
        elif func_name.startswith("kretprobe__"):
            fn = self.load_func(func_name, BPF.KPROBE)
            self.attach_kretprobe(event=fn.name[11:], fn_name=fn.name)
        elif func_name.startswith("tracepoint__"):
            fn = self.load_func(func_name, BPF.TRACEPOINT)
            tp = fn.name[len("tracepoint__"):].replace("__", ":")
            self.attach_tracepoint(tp=tp, fn_name=fn.name)

Based on above code, we can see bcc now supports kprobe, kretprobe and tracepoint. For Hello World example, it actually execute attach_kprobemethod:

def attach_kprobe(self, event="", fn_name="", event_re="",
        pid=-1, cpu=0, group_fd=-1):

    ......

    event = str(event)
    self._check_probe_quota(1)
    fn = self.load_func(fn_name, BPF.KPROBE)
    ev_name = "p_" + event.replace("+", "_").replace(".", "_")
    res = lib.bpf_attach_kprobe(fn.fd, 0, ev_name.encode("ascii"),
            event.encode("ascii"), pid, cpu, group_fd,
            self._reader_cb_impl, ct.cast(id(self), ct.py_object))
    res = ct.cast(res, ct.c_void_p)
    ......
    self._add_kprobe(ev_name, res)
    return self

Finally, it will call libbcc‘s bpf_attach_kprobe function. Until now, the kprobe is attached successfully, and attach_kprobe will return a BPF object.

(b)

BPF(...).trace_print()

The trace_print method is not too hard:

def trace_print(self, fmt=None):
    ......
    try:
        while True:
            if fmt:
                fields = self.trace_fields(nonblocking=False)
                if not fields: continue
                line = fmt.format(*fields)
            else:
                line = self.trace_readline(nonblocking=False)
            print(line)
            sys.stdout.flush()
    except KeyboardInterrupt:
        exit()

For fmt=None case, it just calls trace_readline(), which read from trace pipe, and prints it on the stdout:

print(line)
sys.stdout.flush()

That’s all, and hope you enjoy bcc and ebpf!

Install bcc on ArchLinux

To install bcc on ArchLinux, firstly you need to setup yaourt from AUR:

$ git clone https://aur.archlinux.org/yaourt.git
$ cd yaourt
$ makepkg -si

Then execute yaourt bcc command:

# yaourt bcc
1 aur/bcc 0.3.0-1 [installed] (17) (2.51)
    BPF Compiler Collection - C library and examples
2 aur/bcc-git v0.1.8.r330.52cd371-1 (2) (0.06)
    BPF Compiler Collection - C library and examples
3 aur/bcc-tools 0.3.0-1 [installed] (17) (2.51)
    BPF Compiler Collection - Tools
4 aur/bcc-tools-git v0.1.8.r330.52cd371-1 (2) (0.06)
    BPF Compiler Collection - Tools
......

Select the order number of bcc, bcc-tools, python-bcc and python2-bcc, and install them.

Once finished, the bcc would be installed in the directory of /usr/share/bcc:

# ls
examples  man  tools

To facilitate your daily work, you can add man pages and tools in your .bashrc file:

MANPATH=/usr/share/bcc/man:$MANPATH
PATH=/usr/share/bcc/tools:$PATH

You can also install from source code:

git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake .. -DCMAKE_INSTALL_PREFIX=/usr
make
sudo make install

P.S. To run tools shipped in bcc, you need to install kernel header files:

sudo pacman -S linux-headers

Update on 28/10/2020
Currently, bcc is the official package and split to 3 pieces: bccbcc-tools and python-bcc. Please install all 3 packages. If you forgot to install python-bcc, when running commands from bcc-tools, you will come across following errors:

# /usr/share/bcc/tools/execsnoop
Traceback (most recent call last):
  File "/usr/share/bcc/tools/execsnoop", line 21, in <module>
    from bcc import BPF
ModuleNotFoundError: No module named 'bcc'

Be cautious of discriminating “Ctxt::multiplyBy()” and “Ctxt::*=”

In the past 2 Days, I was frustrated by an issue, which will cause program crashed at HElib‘s Ctxt::reLinearize() function:

assert(W.toKeyID>=0);       // verify that a switching matrix exists

After tough debugging, the root cause is found: I should use Ctxt::multiplyBy() instead of Ctxt::*=. From Ctxt::multiplyBy()‘s implementation:

void Ctxt::multiplyBy(const Ctxt& other)
{
  // Special case: if *this is empty then do nothing
  if (this->isEmpty()) return;

  *this *= other;  // perform the multiplication
  reLinearize();   // re-linearize
}

We can see besides it calls Ctxt::*=, it also invokes reLinearize(), and that’s point!

Clang may be a better choice than gcc in developing OpenMP program

As referred in The first gcc bug I ever meet, I upgraded gcc to the newest 7.1.0 version to conquer building OpenMP errors. But unfortunately, when using taskloop clause, weird issue happened again. My application utilizes HElib, and I just added following statement in a source file:

#pragma omp taskloop

Then the strange link error reported:

In function `EncryptedArray::EncryptedArray(EncryptedArray const&)':
/root/Project/../../HElib/src/EncryptedArray.h:539: undefined reference to `cloned_ptr<EncryptedArrayBase, deep_clone<EncryptedArrayBase> >::cloned_ptr(cloned_ptr<EncryptedArrayBase, deep_clone<EncryptedArrayBase> > const&)'
collect2: error: ld returned 1 exit status

I tried to debug it, nevertheless, nothing valuable was found.

So I attempted to use clang. Install it on ArchLinux like this:

# pacman -S clang
resolving dependencies...
looking for conflicting packages...

Packages (2) llvm-libs-4.0.0-3  clang-4.0.0-3

Total Download Size:53.24 MiB
Total Installed Size:  275.24 MiB

:: Proceed with installation? [Y/n] y
......
checking available disk space  [#########################################] 100%
:: Processing package changes...
(1/2) installing llvm-libs   [#########################################] 100%
(2/2) installing clang   [#########################################] 100%
Optional dependencies for clang
openmp: OpenMP support in clang with -fopenmp
python2: for scan-view and git-clang-format [installed]
:: Running post-transaction hooks...
(1/1) Arming ConditionNeedsUpdate...

Unlike gcc, to enable OpenMP feature in clang, we need to install an additional openmp package:

# pacman -S openmp

Write a simple program:

# cat parallel.cpp
#include <stdio.h>
#include <omp.h>

int main(void) {
    omp_set_num_threads(5);

    #pragma omp parallel for
    for (int i = 0; i < 5; i++) {

        #pragma omp taskloop
        for (int j = 0; j < 3; j++) {
            printf("%d\n", omp_get_thread_num());
        }

    }   
}

Compile and run it:

# clang++ -fopenmp parallel.cpp
# ./a.out
0
0
0
0
0
1
1
2
4
4
4
4
3
0
1

Clang OpenMP works as I expected. Build my project again, no eccentric errors! Work like a charm!

So according to my testing experience, clang may be a better choice than gcc in developing OpenMP program, especially for some new OpenMP features.

The first gcc bug I ever meet

I have used gcc for more than 10 years, but never met a bug before. In my mind, the gcc is one of the stable software in the world, but at yesterday, the myth ended.

I tried to use OpenMP to optimize my program, and all was OK until the taskloop construct was added:

#pragma omp taskloop

The build flow terminated with the following errors:

xxxxxxx.cpp:142:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.archlinux.org/> for instructions.

Whoops! It seemed I got the lucky draw! Since my project uses a lot of compile options:

... -g -O2 -fopenmp -fprofile-arcs -ftest-coverage ...

I must narrow down to find the root cause. Firstly, I only use -fopenmp, then everything was OK; Secondly, adding -g -O2, no problem; …. After combination trial, -fopenmp -fprofile-arcs can cause the problem.

To confirm it, I wrote a simple program:

int main(void) {
    #pragma omp taskloop
    for (int i = 0; i < 2; i++) {
    }
    return 0;
}

Compile it:

# gcc -fopenmp -fprofile-arcs parallel.c
parallel.c:6:1: internal compiler error: Segmentation fault
 }
 ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <https://bugs.archlinux.org/> for instructions.

Yeah, the bug was reproduced! It verified my assumption.

To bypass this issue, I decided to try the newest gcc. My OS is ArchLinux,and the gcc version is 6.3.1:

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 6.3.1 20170306 (GCC)

Now ArchLinux doesn’t provide gcc 7.1 installation package, so I need to download and build it myself:

# wget http://gcc.parentingamerica.com/releases/gcc-7.1.0/gcc-7.1.0.tar.gz
# tar xvf gcc-7.1.0.tar.gz
# cd gcc-7.1.0/
# mkdir build
# cd build

Select the configuration options is a headache task for me, and I decide to copy the current options for 6.3.1 (Please refer the above output from gcc -v):

# ../configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info .....

Since I don’t need to compile ada and lto, I remove these from --enable-languages:

--enable-languages=c,c++,fortran,go,objc,obj-c++

Besides this, I also need to build and install isl library myself or through ArchLinux isl package. Once configuration is successful, I can build and install it:

# make
# make install

Check the newest gcc:

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,fortran,go,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 7.1.0 (GCC)

Compile the program again:

# gcc -fopenmp -fprofile-arcs parallel.c
#

This time compilation is successful!

P.S. The gcc may have other bugs when supporting some new OpenMP directives, so please pay attention to it.