Why does SSL client report google’s certificate “self-signed”?

In previous post, I implemented a simple HTTPS client, but the program has a small flaw, i.e., when connecting to “www.google.com:443“, it will report following error in verifying certificate:

error code is 18:self signed certificate

error code is from SSL_get_verify_result:

long SSL_get_verify_result(const SSL *ssl)
{
    return ssl->verify_result;
}

and 18 is mapping to X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT, which means “self-signed certificate”. But for other websites, e.g., facebook.com, no error is outputted.

Use OpenSSL‘s client-arg program to test:

# LD_LIBRARY_PATH=/root/openssl/build gdb --args ./client-arg -connect "www.google.com:443"
......
Thread 2 hit Breakpoint 1, main (argc=3, argv=0xfffffc7fffdf4c38) at client-arg.c:99
99      BIO_puts(sbio, "GET / HTTP/1.0\n\n");
(gdb) p ssl->verify_result
$1 = 18
(gdb)

The same error code: 18. But openssl-s_client can guarantee the certificate is not “self-signed”:

# LD_LIBRARY_PATH=/root/openssl/build openssl s_client -connect google.com:443
CONNECTED(00000004)
depth=2 OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
verify return:1
depth=1 C = US, O = Google Trust Services, CN = GTS CA 1O1
verify return:1
depth=0 C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
verify return:1
---
Certificate chain
 0 s:C = US, ST = California, L = Mountain View, O = Google LLC, CN = *.google.com
   i:C = US, O = Google Trust Services, CN = GTS CA 1O1
 1 s:C = US, O = Google Trust Services, CN = GTS CA 1O1
   i:OU = GlobalSign Root CA - R2, O = GlobalSign, CN = GlobalSign
---
......

Hmm, I need to find the root cause.

First of all, I searched the code to see when X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT is set, and found only one spot:

if (self_signed)
            return verify_cb_cert(ctx, NULL, num - 1,
                                  sk_X509_num(ctx->chain) == 1
                                  ? X509_V_ERR_DEPTH_ZERO_SELF_SIGNED_CERT
                                  : X509_V_ERR_SELF_SIGNED_CERT_IN_CHAIN);

The interesting thing is the amount of certificates in the chain is only 1, but from above openssl-s_client‘s output, there are 2 certificates in the chain. OK, let’s see the content of this “self-signed” certificate.

After some debugging, I finally found tls_process_server_certificate, which is used to process the server’s certificate. With the help of gdb, I can dump the content of certificate:

# gdb --args ./client www.google.com:443
.......
(gdb) b tls_process_server_certificate
......
Thread 2 hit Breakpoint 1, tls_process_server_certificate (s=0xf09e90, pkt=0xfffffc7fffdefe30)
    at ../ssl/statem/statem_clnt.c:1768
1768        X509 *x = NULL;
......
1806            if (certbytes != (certstart + cert_len)) {
(gdb)
1811            if (SSL_IS_TLS13(s)) {
(gdb) dump binary memory cert certstart certstart + cert_len
......

Try to check the cert file:

# cat cert
......
�0� *�H��       (No SNI provided; please fix your client.10Uinvalid2.invalid0�"0
��bO����
.....

The reason is obvious: “No SNI provided; please fix your client.”. Ah, I need to set SNI explicitly. After invoking SSL_set_tlsext_host_name, the certificate chain becomes correct (The new code can be downloaded here).

Summary: I am not an SSL/TLS expert, and OpenSSL project is complex and daunting. But with some basic SSL/TLS knowledge and the help of debugger, I can find the root cause of issues independently. Don’t give up, digest code bit by bit, finally you will win!

Build OpenSSL on OmniOS

Clone OpenSSL and create a build directory:

# git clone https://github.com/openssl/openssl.git
# cd openssl
# mkdir build
# cd build

List the supported build platforms of openssl:

# ../Configure LIST
BS2000-OSD
BSD-generic32
BSD-generic64
......

Because there is no illumos option, use Solaris instead. At the same time, I want to build in debug mode:

# ../Configure solaris64-x86_64-gcc --debug
# make
# make test

That’s all!

The doubt about “perf record -p pid sleep 10”

In Brendan Gregg’s awesome perf Examples, there is one command which can be used to sample for assigned time, e.g., 10 seconds:

# Sample on-CPU functions for the specified PID, at 99 Hertz, for 10 seconds:
perf record -F 99 -p PID sleep 10

I have a doubt about it: wouldn’t perf sample two processes, one with specified PID and the other is sleep task? Yes, I know sleep should do nothing to affect sampling, but as someone who likes to find the root cause, I want to know whether the sleep will be sampled or not.

First of all, check the manual page, unfortunately, I couldn’t get the answer.

Secondly, experiment. I wrote a simple program which just prints something forever:

$ cat a.c
#include <stdio.h>
int main()
{
    while (1)
        printf("Hello\n");
}

Compile and use perf record to profile it for a while:

$ gcc a.c
$ sudo perf record ./a.out
Hello
Hello
.....;
^C[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.106 MB perf.data (2409 samples) ]

Check the perf.data:

Samples: 2K of event 'cycles:ppp', Event count (approx.): 853929764
Overhead  Command  Shared Object      Symbol
   9.66%  a.out    [kernel.kallsyms]  [k] retint_userspace_restore_args
   7.69%  a.out    [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
   4.29%  a.out    [kernel.kallsyms]  [k] n_tty_write
   3.75%  a.out    [kernel.kallsyms]  [k] system_call_after_swapgs
   3.63%  a.out    [kernel.kallsyms]  [k] irq_return
   3.01%  a.out    [kernel.kallsyms]  [k] native_write_msr_safe
......

Nothing special, just to make sure that a.out process can be profiled. Launch htop program whose PID is 39741, then execute following command for several seconds:

$ sudo perf record -p 39741 ./a.out
Hello
Hello
......
^CHello
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (98 samples) ]

a.out ran smoothly, but from perf report:

Samples: 98  of event 'cycles:ppp', Event count (approx.): 77850922
Overhead  Command  Shared Object       Symbol
   1.02%  htop     [kernel.kallsyms]   [k] sys_read                                                  ▒
   1.02%  htop     [kernel.kallsyms]   [k] getname                                                   ▒
   1.02%  htop     [kernel.kallsyms]   [k] __d_lookup_rcu                                            ▒
   1.02%  htop     [kernel.kallsyms]   [k] generic_permission                                        ▒
   1.02%  htop     [kernel.kallsyms]   [k] do_task_stat
......

I couldn’t see any samples from a.out, and all are from htop. So it seems a.out won’t be profiled in this scenario.

Third, check perf source code to verify my speculation. Roughly speaking, “perf record [options] command [options]” will fork a child process to run command, and it is in evlist__prepare_workload:

int evlist__prepare_workload(struct evlist *evlist, struct target *target, const char *argv[],
                 bool pipe_output, void (*exec_error)(int signo, siginfo_t *info, void *ucontext))
{
    ......
    if (target__none(target)) {
        if (evlist->core.threads == NULL) {
            fprintf(stderr, "FATAL: evlist->threads need to be set at this point (%s:%d).\n",
                __func__, __LINE__);
            goto out_close_pipes;
        }
        perf_thread_map__set_pid(evlist->core.threads, 0, evlist->workload.pid);
    }
    ......
}

For target__none:

static inline bool target__none(struct target *target)
{
    return !target__has_task(target) && !target__has_cpu(target);
}

Because I have already set interested process by pid before, target__has_task(target) will return true, which will cause target__none() returns false, therefore perf_thread_map__set_pid won’t be invoked and the forked process isn’t added into evlist->core.threads. That’s the reason why the command process won’t be profiled. Similarly, if target__has_cpu(target) returns true, the command process is not tracked either.

In summary, to find answer of doubt: read code, write code to verify, repeat, that’s all.