A trick of setting breakpoint in pdb

When using pdb to debug a python program:

python -m pdb foo.py

I want to set a breakpoint, but meet following error:

(Pdb) b bar.py:46
*** 'bar.py' not found from sys.path

A small trick is setting breakpoint in main first and run the program:

(Pdb) b main
Breakpoint 1 at ......
(Pdb) r
......

After breakpoint set for main is hit, set breakpoint again at bar.py:46. This time it should work:

(Pdb) b bar.py:46
Breakpoint 2 at ......

Rewrite a python program using C to boost performance

Recently I converted a python program to C. The python program will run for about 1 hour to finish the task:

$ /usr/bin/time -v taskset -c 35 python_program ....
......
        User time (seconds): 3553.48
        System time (seconds): 97.70
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:00:51
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 12048772
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 11434463
        Voluntary context switches: 58873
        Involuntary context switches: 21529
        Swaps: 0
        File system inputs: 1918744
        File system outputs: 4704
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

while the C program only needs about 5 minutes:

$ /usr/bin/time -v taskset -c 35 c_program ....
......
        User time (seconds): 282.45
        System time (seconds): 8.66
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 4:51.17
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 16430216
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 3962437
        Voluntary context switches: 14
        Involuntary context switches: 388
        Swaps: 0
        File system inputs: 1918744
        File system outputs: 4960
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

From the /usr/bin/time‘s output, we can see python program uses less memory than C program, but suffers more “page faults” and “context switches”.

Enhance Hyperscan’s pcapCorpus.py script

Hyperscan‘s pcapCorpus.py script can convert a pcap file containing UDP and TCP packets to a corpus file which can be processed by hsbench program, I did some improvements to this script:

a) Support parsing VLAN packets;
b) Continue to handle instead of exit when meeting exceptional packets.

P.S., the code can be downloaded here.

Create a symbol link for python after installing it on OpenBSD

After installing Python on FreeBSD:

# pkg_add python
quirks-2.304 signed on 2017-04-02T15:01:33Z
Ambiguous: choose package for python
a       0: <None>
        1: python-2.7.13p0
        2: python-3.4.5p2
        3: python-3.5.2p2
        4: python-3.6.0p0
Your choice: 4

It won’t create a symbol link by default:

# python
ksh: python: not found

So for using it handily, you can create a symbol link yourself:

# cd /usr/local/bin
# ln -s python3.6 python

Use pdb to help understand python program

As I have mentioned in Why do I need a debugger?:

(3) Debugger is a good tool to help you understand code.

So when I come across difficulty to understand vfscount.py code in bcc project, I know it is time to resort to pdb, python‘s debugger, to help me.

The thing which confuses me is here:

counts = b.get_table("counts")
for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
    print("%-16x %-26s %8d" % (k.ip, b.ksym(k.ip), v.value))

From previous code:

BPF_HASH(counts, struct key_t, u64, 256);

It seems the v‘s type is u64, and I can’t figure out why use v.value to fetch its data here.

pdb‘s manual is very succinct and its command is so similar with gdb‘s, so it is no learning curve for me. Just launch a debugging session and set breakpoint at “counts = b.get_table("counts")” line:

# python -m pdb vfscount.py
> /root/Project/bcc/tools/vfscount.py(14)<module>()
-> from __future__ import print_function
(Pdb) b vfscount.py:49

Start the program and press Ctrl-C after seconds; the breakpoint will be hit:

(Pdb) r
Tracing... Ctrl-C to end.
^C
ADDR             FUNC                          COUNT
> /root/Project/bcc/tools/vfscount.py(49)<module>()
-> counts = b.get_table("counts")

Step into get_table method, and single-step every line. Before leaving method, check the type of keytype and leaftype:

-> counts = b.get_table("counts")
(Pdb) s
--Call--
> /usr/lib/python3.6/site-packages/bcc/__init__.py(416)get_table()
-> def get_table(self, name, keytype=None, leaftype=None, reducer=None):
(Pdb) n
> /usr/lib/python3.6/site-packages/bcc/__init__.py(417)get_table()
-> map_id = lib.bpf_table_id(self.module, name.encode("ascii"))
......
(Pdb) p leaf_desc
b'"unsigned long long"'
(Pdb) n
> /usr/lib/python3.6/site-packages/bcc/__init__.py(430)get_table()
-> leaftype = BPF._decode_table_type(json.loads(leaf_desc.decode()))
(Pdb)
> /usr/lib/python3.6/site-packages/bcc/__init__.py(431)get_table()
-> return Table(self, map_id, map_fd, keytype, leaftype, reducer=reducer)
(Pdb) p leaftype
<class 'ctypes.c_ulong'>
(Pdb) p keytype
<class 'bcc.key_t'>

Yeah! The magic is here: leaftype‘s type is not pure u64, but ctypes.c_ulong. According to document:

>>> print(i.value)
42

We should use v.value to get its internal data.

Happy pdbing! Happy python debugging!