A trick of setting breakpoint in pdb

When using pdb to debug a python program:

python -m pdb foo.py

I want to set a breakpoint, but meet following error:

(Pdb) b bar.py:46
*** 'bar.py' not found from sys.path

A small trick is setting breakpoint in main first and run the program:

(Pdb) b main
Breakpoint 1 at ......
(Pdb) r
......

After breakpoint set for main is hit, set breakpoint again at bar.py:46. This time it should work:

(Pdb) b bar.py:46
Breakpoint 2 at ......

Use gdb’s convenience functions

Today I tried to set a conditional breakpoint in my program when a string variable is assigned one specific value:

b foo.c:488 if (int)strcmp(foo, "foo") == 0

But unfortunately, the gdb will exit with following error:

Unable to restore previously selected frame:
Selected thread is running.
terminate called after throwing an instance of 'gdb_exception_error'
Aborted

After checking in stackoverflow, I found gdb‘s convenience functions. So using $_streq instead of strcmp:

b foo.c:488 if $_streq(foo, "foo")

The gdb works like a charm!

An issue related to uninitialised memory

Today I met an interesting bug: A C program behaved differently between debug (gcc -O0) and release (gcc -O3) modes.

First of all, I compared the logs between two modes, and pinned down in which function, the logs began to diverge.

Secondly, I used gdb to debug two programs simultaneously, and checked the variables’ values, then found a variable which had disparate values that would cause two programs enter different branches in a if-else statement. Hmm, this was the root cause.

My gut feeling was the release mode program may fetch the staled value, but after reviewing code carefully, I found the reason is one block memory (the variable belonged to) allocated from heap was not initialised, so this will introduce notorious “undefined behaviour”.

As far as I know, the reasons for uninitialising variables:
(1) The programmer forgets;
(2) The programmer reckons the variable will be assigned correct value before use, and there may be performance penalty for initialising a block of memory.
Anyway, the lesson I learnt today is unless you are 100% sure it will be OK to uninitialise the specified variable, otherwise please initialise it, and this can save you several hours in the future.

Use libunwind to debug memory leak issue

In our project, there is a shared object with a reference counter, which will be increased if others acquire it and decreased if released. Once the reference counter is 0, the shared object can be reaped. Then we found the classical memory leak issue, i.e., the memory of shared object keeps growing. To debug this issue, I used libunwind.

The principle is simple: print the stack traces of every increment/decrement operations. I borrowed code from Programmatic access to the call stack in C++, and did some tweaks: mostly format the stack traces and output to file. The output is like this:

$ cat /tmp/backtrace.log
0x55ad59ec2556: (foo+0x9)
0x55ad59ec2562: (bar+0x9)
0x55ad59ec2579: (main+0x14)
0x7f941161ee0a: (__libc_start_main+0xea)
0x55ad59ec214a: (_start+0x2a)

A quick method to know the specific position in source code is through gdb: attach the program, then use “info line” command:

$ gdb program -p pid
......
(gdb) info line *0x55ad59ec2556
......

P.S., the code can be download here.

Why does SSL client report google’s certificate “self-signed”?

In previous post, I implemented a simple HTTPS client, but the program has a small flaw, i.e., when connecting to “www.google.com:443“, it will report following error in verifying certificate:

error code is 18:self signed certificate

error code is from SSL_get_verify_result:

long SSL_get_verify_result(const SSL *ssl)
{
    return ssl->verify_result;
}

and 18 is mapping to X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT, which means “self-signed certificate”. But for other websites, e.g., facebook.com, no error is outputted.

Use OpenSSL‘s client-arg program to test:

# LD_LIBRARY_PATH=/root/openssl/build gdb --args ./client-arg -connect "www.google.com:443"
......
Thread 2 hit Breakpoint 1, main (argc=3, argv=0xfffffc7fffdf4c38) at client-arg.c:99
99      BIO_puts(sbio, "GET / HTTP/1.0\n\n");
(gdb) p ssl->verify_result
$1 = 18
(gdb)

The same error code: 18. But openssl-s_client can guarantee the certificate is not “self-signed”:

# LD_LIBRARY_PATH=/root/openssl/build openssl s_client -connect google.com:443
CONNECTED(00000004)
depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
verify return:1
depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1
verify return:1
depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
   i:C = US, O = Google Trust Services, CN = GTS CA 1O1
 1 s:C = US, O = Google Trust Services, CN = GTS CA 1O1
   i:OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
---
......

Hmm, I need to find the root cause.

First of all, I searched the code to see when X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT is set, and found only one spot:

if (self_signed)
            return verify_cb_cert(ctx, NULL, num - 1,
                                  sk_X509_num(ctx->chain) == 1
                                  ? X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT
                                  : X509_V_ERR_SELF_SIGNED_CERT_IN_CHAIN);

The interesting thing is the amount of certificates in the chain is only 1, but from above openssl-s_client‘s output, there are 2 certificates in the chain. OK, let’s see the content of this “self-signed” certificate.

After some debugging, I finally found tls_process_server_certificate, which is used to process the server’s certificate. With the help of gdb, I can dump the content of certificate:

# gdb --args ./client www.google.com:443
.......
(gdb) b tls_process_server_certificate
......
Thread 2 hit Breakpoint 1, tls_process_server_certificate (s=0xf09e90, pkt=0xfffffc7fffdefe30)
    at ../ssl/statem/statem_clnt.c:1768
1768        X509 *x = NULL;
......
1806            if (certbytes != (certstart + cert_len)) {
(gdb)
1811            if (SSL_IS_TLS13(s)) {
(gdb) dump binary memory cert certstart certstart + cert_len
......

Try to check the cert file:

# cat cert
......
�0� *�H��       (No SNI provided; please fix your client.10Uinvalid2.invalid0�"0
��bO����
.....

The reason is obvious: “No SNI provided; please fix your client.”. Ah, I need to set SNI explicitly. After invoking SSL_set_tlsext_host_name, the certificate chain becomes correct (The new code can be downloaded here).

Summary: I am not an SSL/TLS expert, and OpenSSL project is complex and daunting. But with some basic SSL/TLS knowledge and the help of debugger, I can find the root cause of issues independently. Don’t give up, digest code bit by bit, finally you will win!