Nan Xiao's Blog

A system software / performance engineer's home

Tag: debug (Page 1 of 2)

Use pdb to help understand python program

As I have mentioned in Why do I need a debugger?:

(3) Debugger is a good tool to help you understand code.

So when I come across difficulty to understand vfscount.py code in bcc project, I know it is time to resort to pdb, python‘s debugger, to help me.

The thing which confuses me is here:

counts = b.get_table("counts")
for k, v in sorted(counts.items(), key=lambda counts: counts[1].value):
    print("%-16x %-26s %8d" % (k.ip, b.ksym(k.ip), v.value))

From previous code:

BPF_HASH(counts, struct key_t, u64, 256);

It seems the v‘s type is u64, and I can’t figure out why use v.value to fetch its data here.

pdb‘s manual is very succinct and its command is so similar with gdb‘s, so it is no learning curve for me. Just launch a debugging session and set breakpoint at “counts = b.get_table("counts")” line:

# python -m pdb vfscount.py
> /root/Project/bcc/tools/vfscount.py(14)<module>()
-> from __future__ import print_function
(Pdb) b vfscount.py:49

Start the program and press Ctrl-C after seconds; the breakpoint will be hit:

(Pdb) r
Tracing... Ctrl-C to end.
^C
ADDR             FUNC                          COUNT
> /root/Project/bcc/tools/vfscount.py(49)<module>()
-> counts = b.get_table("counts")

Step into get_table method, and single-step every line. Before leaving method, check the type of keytype and leaftype:

-> counts = b.get_table("counts")
(Pdb) s
--Call--
> /usr/lib/python3.6/site-packages/bcc/__init__.py(416)get_table()
-> def get_table(self, name, keytype=None, leaftype=None, reducer=None):
(Pdb) n
> /usr/lib/python3.6/site-packages/bcc/__init__.py(417)get_table()
-> map_id = lib.bpf_table_id(self.module, name.encode("ascii"))
......
(Pdb) p leaf_desc
b'"unsigned long long"'
(Pdb) n
> /usr/lib/python3.6/site-packages/bcc/__init__.py(430)get_table()
-> leaftype = BPF._decode_table_type(json.loads(leaf_desc.decode()))
(Pdb)
> /usr/lib/python3.6/site-packages/bcc/__init__.py(431)get_table()
-> return Table(self, map_id, map_fd, keytype, leaftype, reducer=reducer)
(Pdb) p leaftype
<class 'ctypes.c_ulong'>
(Pdb) p keytype
<class 'bcc.key_t'>

Yeah! The magic is here: leaftype‘s type is not pure u64, but ctypes.c_ulong. According to document:

>>> print(i.value)
42

We should use v.value to get its internal data.

Happy pdbing! Happy python debugging!

The tips of using and debugging C++ std::iostream

(1) Be cautious of '\n' and std::endl.

The std::endl will flush the output buffer while '\n' not. For example, in socket programming, if client sends message to server using '\n', like this:

out << "Hello World!" << '\n';
out << "I am coming" << '\n';

The server may still block in reading operation and no data is fetched. So you should use std::endl in this case:

out << "Hello World!" << std::endl;
out << "I am coming" << std::endl;

(2) Use std::ios::rdstate.

std::ios::rdstate is a handy function to check the stream state. You can use it in gdb debugging:

(gdb) p in.rdstate()
$45 = std::_S_goodbit
(gdb) n
350             return in;
(gdb) p in.rdstate()
$46 = std::_S_failbit

Through step-mode, we can see which operation crashes the stream.

(3) Serialize the data into file.

No matter you want to do test or debug issue, dump the data into a file and read it back is a very effective method:

std::ofstream ofs("data.txt");
ofs << output;
ofs.close();

std::ifstream ifs("data.txt");
ifs >> input;
ifs.close();

The above simple code can verify whether your serialization functions are correct or not. Trust me, it is a very brilliant trouble-shouting std::iostream issue trick, and it saved me just now!

An example of debugging parallel program

In the past 2 weeks, I was tortured by a nasty bug, yes, a not-always-reproducible, use-third-party-code, parallel-program issue. Thank Goodness! The root cause was found on this Thursday, and I think the whole progress is a perfect experience and tutorial of how to debug parallel program. So I record it here and hope it can give you some tips when facing similar situation.

Background: I am leveraging FHEW to implement some complex calculations and experiments. Now the FHEW only works in single-thread environment, since many functions share the same global variables, like this:

......
double *in;
fftw_complex *out;
fftw_plan plan_fft_forw, plan_fft_back;

void FFTsetup() {
  in = (double*) fftw_malloc(sizeof(double) * 2*N);
  out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * (N + 2));
  plan_fft_forw = fftw_plan_dft_r2c_1d(2*N, in, out,  FFTW_PATIENT);
  plan_fft_back = fftw_plan_dft_c2r_1d(2*N, out, in,  FFTW_PATIENT);
}

void FFTforward(Ring_FFT res, const Ring_ModQ val) {
  for (int k = 0; k < N; ++k)   {
    in[k] = (double) (val[k]);
    in[k+N] = 0.0;          
  }
  fftw_execute(plan_fft_forw); 
  for (int k = 0; k < N2; ++k) 
    res[k] = (double complex) out[2*k+1];               
}
.....

So to make it can be used in parallel, I first need to modify it to be used in multi-thread program. This work involves of allocating resources of every thread, adding lock of necessary global variables, etc. The multi-thread FHEW works always OK until I employed it to implement a complicated function. At most times the result was correct except some rare occasions. So I began the tough debugging work.

(1) Log. Undoubtedly, the log is the most effective one in all debugging tools. I tried to add as many traces as I can. An important attention you must pay is now that multiple threads are running simultaneously, you must identify the logs are from which thread. For example, if you use OpenMP, you can use following code to differentiate thread IDs:

......
printf("Thread(%d/%d) ......", omp_get_thread_num(), omp_get_num_threads(), ......);
......

After analyzing the log, a function was spotted, but it did noting but use 2 consecutive HomNAND to implement the AND operations (You should consider HomNAND is just a NAND gate in logical circuit, and use two NAND gates can realize AND gate.). According to AND gate truth-table, when two inputs are 0, the output should be 0 too, but in my program, the result is 1, which causes the final error.

(2) Simplify the program and try to find the reproducible condition. Since the bug was related to HomNAND and my original program was a little complicated, I needed to do some simple tests and try to find the reproducible condition. I designed 3 experiments:

a) Use original single-thread FHEW, and execute one HomNAND continuously:

for (;;) {
    ......
    HomNAND();
    ......
}

b) Use modified multi-thread FHEW, and spawn 8 threads. Every thread run one HomNAND in dead-loop:

#pragma omp parallel for
for (int loop = 0; loop < 8; loop++) {
    ......
    HomNAND();
    ......
}

c) Use modified multi-thread FHEW, and spawn 8 threads. Every thread run two HomNANDs to achieve AND function in dead-loop:

#pragma omp parallel for
for (int loop = 0; loop < 8; loop++) {
    ......
    HomNAND()
    HomNAND();
    ......
}

After testing, I found case a) and b) were always OK, while c) would generate wrong results after running 1 or 2 hours.

(3) Now the thing became intricate. Because I modified the FHEW, and FHEW also used FFTW, it was not an easy task to locate where the problem was. Additionally, the thread-safe usage of FFTW also involved many discussions (like this) and pitfalls (like this), so I resort the authors of FFTW to make sure my code is not wrong.

(4) Back to the experiments, I suddenly found that I have tested single-thread-single-HomNAND, multiple-thread-single-HomNAND, multiple-thread-two-HomNAND cases, but how about single-thread-two-HomNAND? If single-thread-two-HomNAND case can fail, it can prove the my modification code was innocent, and the problem should reside on the original FHEW code:

for (;;) {
    ......
    HomNAND();
    HomNAND();
    ......
}

Yeah, I was very lucky! After one hour, the error occurred. The murder was found, so the next step was to seek solution with the author.

Based on this experience, I think analyzing log and simplifying test model are very important techniques during resolving parallel program bugs. Hope everyone can improve debugging ability, happy debugging!

Why do I need a debugger?

When I begin to learn a new programming language, I will try and master the debugger for it as early as possible. For example, in 2013, while I touched the Go, there seems only gdb for use. Although gdb itself is not a good choice (From Debugging Go Code with GDB):

As a consequence, although GDB can be useful in some situations, it is not a reliable debugger for Go programs, particularly heavily concurrent ones.

But at that time there was no other choice. So after delve came out, I switched to it without hesitation. Though I am not an expert of delvecode, I still try my best do some contributions to make delve become better: improve documents, report issues, etc. Why am I so keen on debugger? The answer is it is a really irreplaceable and necessary tool for software engineers. The reasons are as follows:

(1) If the print statements are the only debugging method of a programming language, it will make feel upset. I.e., if a bug is fully reproduceable, but from the logs you can’t figure out the reason. If there is a debugger which can help you step into every statement and inspect value of every variable, I think you can get the root cause quickly.

Certainly, the debugger isn’t omnipotent. E.g., the nasty multi-threads bug (If you are interested in this topic, you can read this post which describes an experience I have undergone.). If you can’t find reproduce condition of this issue, and adding logs still fails, you can try to add some assert statements which are triggered when the issue happens again. Then you can get the core dump file which records the scene of crime, and use debugger to analyze it. You can see the debugger plays an important part in this scenario yet.

(2) “Debugger is a perfect unit-test tool”, the words come form my director when I got my first full-time job after leaving school. The reason is when you finish a code segment, you can use debugger to check whether it is correct through step-in mode: check value of every variable, mock the conditions which can’t easily be constructed in black-box test, inspect the stack and registers, etc. By means of this, you can fix many corner bugs.

(3) Debugger is a good tool to help you understand code. When I dive into some big Go projects, I find so many channels, interfaces, goroutines, and they sometimes make me crazy. But by way of using debugger, I can set breakpoints, then once the program is stopped, I can understand the code logic better through watching the calling stack.

Based on the above, debugger is an invaluable tool for everyone who lives on writing code. Try know and master it better. Maybe one day, a colleague can’t find reasons for one bug, then you use a small debugger trick and spot the root cause immediately. Isn’t it a cool stuff? 🙂

A brief intro of delve

delve is a debugger developed in Golang and also dedicated to help trouble-shooting Golang programs (Home page is here). Though it is still in pre-1.0 release, I think it is stable enough for daily use. BTW, if you find some bugs, you can report it to developers and help to make delve more stronger! P.S., Albeit fmt.Print buddies are useful in most cases, I strongly recommend you try to usedelve to inspect the internal mechanism of your code, because it can help you know Golang deeper, not just superficial stuff.

Installing delve is very simple: taking Linux platform as an example, it is no different from setting up other Golang projects, just “go get” is enough:

go get github.com/derekparker/delve/cmd/dlv 

Now, in $GOPATH/bin, there is an extra dlv executable binary file (Notice: the project is named delve, while the executable file is calleddlv. I even made a foolish mistake when began to use it).Run dlv command, and it will show you a detailed manual of delve:

# dlv
Delve is a source level debugger for Go programs.

Delve enables you to interact with your program by controlling the execution of the process,
evaluating variables, and providing information of thread / goroutine state, CPU register state and more.

The goal of this tool is to provide a simple yet powerful interface for debugging Go programs.

Usage:
  dlv [command]

Available Commands:
  version     Prints version.
  run         Deprecated command. Use 'debug' instead.
  debug       Compile and begin debugging program.
......

Let’s check this artificial Hello.go program:

package main

import "fmt"

func main() {
        var s []byte
        s = append(s, []byte("Hello, Debugging!")...)
        fmt.Println(string(s))
}

Use delve to debug it:

# dlv debug Hello.go
Type 'help' for list of commands.
(dlv) help
The following commands are available:
    help (alias: h) ------------- Prints the help message.
    break (alias: b) ------------ Sets a breakpoint.
    trace (alias: t) ------------ Set tracepoint.
    restart (alias: r) ---------- Restart process.
    continue (alias: c) --------- Run until breakpoint or program termination.
    step (alias: s) ------------- Single step through program.
    step-instruction (alias: si)  Single step a single cpu instruction.
    next (alias: n) ------------- Step over to next source line.
    threads --------------------- Print out info for every traced thread.
    thread (alias: tr) ---------- Switch to the specified thread.
......

If you are familiar with gdb, you will find the commands are very similar, and I promise you can master delve soon.

An interesting thing is that the delve doesn’t provide start command which gdb offers, so you should try to set breakpoints first, then run continue command:

(dlv) b Hello.go:8
Breakpoint 1 set at 0x4011ea for main.main() ./Hello.go:8
(dlv) c
> main.main() ./Hello.go:8 (hits goroutine(1):1 total:1) (PC: 0x4011ea)
     3: import "fmt"
     4:
     5: func main() {
     6:         var s []byte
     7:         s = append(s, []byte("Hello, Debugging!")...)
=>   8:         fmt.Println(string(s))
     9: }
(dlv) p s
[]uint8 len: 17, cap: 32, [72,101,108,108,111,44,32,68,101,98,117,103,103,105,110,103,33]
(dlv) goroutines
[4 goroutines]
* Goroutine 1 - User: ./Hello.go:8 main.main (0x4011ea)
  Goroutine 2 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 3 - User: /usr/local/go/src/runtime/proc.go:263 runtime.gopark (0x42a153)
  Goroutine 4 - User: /usr/local/go/src/runtime/mfinal.go:144 runtime.runfinq (0x413f80)

Cool! Isn’t it? Now You can observe almost everything you want to know about your program.

Happy Debugging! Happy delving!

 

Page 1 of 2

Powered by WordPress & Theme by Anders Norén