Tag Archives: gRPC

Leverage comprehensive debugging tricks in one shot

Published / by nanxiao / Leave a Comment

Last Friday, a colleague told me that when connecting an invalid address, the client using gRPC will block forever. To verify it, I use the example code shipped in gRPC:

GreeterClient greeter(grpc::CreateChannel(
  "localhost:50051", grpc::InsecureChannelCredentials()));

Change the "localhost:50051" to "badhost:50051", then compile and execute the program. Sure enough, the client hang without any response. At the outset, I thought it should be a common issue, and there must be a solution already. So I just submitted a post in the discussion group, although there was some responses, but since they were not the satisfactory explanations, I knew I need to trouble-shooting myself.

(1) The first thing I wanted to make sure was whether the network card had sent requests to badhost or not, so I used tcpdump to capture the packets:

$ sudo tcpdump -A -s 0 'port 50051' -i enp7s0f0

But there isn’t any data captured. To double-confirm, I also used tcpconnect program to check:

$ sudo tcpconnect -P 50051
PID    COMM         IP SADDR            DADDR            DPORT

Still nothing output.

(2) Although I couldn’t find the connect request to port 50051, no matter what application on *NIX, it will definitely call connect function at the end. So I changed the tactic, and tried to find who calls the connect:

a) Build gRPC with debugging info (The reason of using “PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig” is here):

$ PKG_CONFIG_PATH=/usr/lib/openssl-1.0/pkgconfig CC=clang CXX=clang++ CFLAGS="-g -O0" CXXFLAGS="-g -O0" make

b) Modify the Makefile to build client program with debugging info:

CXXFLAGS += -g -std=c++11

c) Use gdb to debug the program, after starting it, set breakpoint at connect function:

$ gdb -q greeter_client
Reading symbols from greeter_client...done.
(gdb) start
Temporary breakpoint 1 at 0x146fe: file greeter_client.cc, line 74.
Starting program: /home/xiaonan/Project/grpc/examples/cpp/helloworld/greeter_client
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffea88) at greeter_client.cc:74
74      int main(int argc, char** argv) {
(gdb) b connect
Breakpoint 2 at 0x7ffff6619b80 (2 locations)

Then continue executing the program. When the breakpoint was hit, check the stack:

(gdb) c
Continuing.
[New Thread 0x7ffff4edc700 (LWP 28396)]
[New Thread 0x7ffff46db700 (LWP 28397)]
[Switching to Thread 0x7ffff4edc700 (LWP 28396)]

Thread 2 "greeter_client" hit Breakpoint 2, 0x00007ffff6619b80 in connect () from /usr/lib/libc.so.6

(gdb) bt
#0  0x00007ffff6619b80 in connect () from /usr/lib/libc.so.6
#1  0x00007ffff664e61e in open_socket () from /usr/lib/libc.so.6
#2  0x00007ffff664f156 in __nscd_open_socket () from /usr/lib/libc.so.6
#3  0x00007ffff664ccc6 in __nscd_getai () from /usr/lib/libc.so.6
#4  0x00007ffff66038bc in gaih_inet.constprop () from /usr/lib/libc.so.6
#5  0x00007ffff6604724 in getaddrinfo () from /usr/lib/libc.so.6
#6  0x00007ffff714ee1e in ?? () from /usr/lib/libgrpc.so.4
#7  0x00007ffff714f38c in ?? () from /usr/lib/libgrpc.so.4
#8  0x00007ffff714d020 in ?? () from /usr/lib/libgrpc.so.4
#9  0x00007ffff714cf12 in ?? () from /usr/lib/libgrpc.so.4
#10 0x00007ffff71fff57 in ?? () from /usr/lib/libgrpc.so.4
#11 0x00007ffff7755049 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007ffff6618f0f in clone () from /usr/lib/libc.so.6

Then continue to run the program, the breakpoint was hit again:

(gdb) bt
#0  0x00007ffff6619b80 in connect () from /usr/lib/libc.so.6
#1  0x00007ffff664e61e in open_socket () from /usr/lib/libc.so.6
#2  0x00007ffff664f156 in __nscd_open_socket () from /usr/lib/libc.so.6
#3  0x00007ffff664ccc6 in __nscd_getai () from /usr/lib/libc.so.6
#4  0x00007ffff66038bc in gaih_inet.constprop () from /usr/lib/libc.so.6
#5  0x00007ffff6604724 in getaddrinfo () from /usr/lib/libc.so.6
#6  0x00007ffff714ee1e in ?? () from /usr/lib/libgrpc.so.4
#7  0x00007ffff714f38c in ?? () from /usr/lib/libgrpc.so.4
#8  0x00007ffff714d020 in ?? () from /usr/lib/libgrpc.so.4
#9  0x00007ffff714cf12 in ?? () from /usr/lib/libgrpc.so.4
#10 0x00007ffff71fff57 in ?? () from /usr/lib/libgrpc.so.4
#11 0x00007ffff7755049 in start_thread () from /usr/lib/libpthread.so.0
#12 0x00007ffff6618f0f in clone () from /usr/lib/libc.so.6
(gdb)

Oh, I see! The resolving of badhost must be failed, so there would definitely no subsequent connecting port 50051. But why the client was trying to resolve name again and again? If I find this cause, it can explain why client was blocking.

(3) Since there is ?? from /usr/lib/libgrpc.so.4, I can’t know which function was the culprit. I can go over the code, but I think I need the direct proof. Build gRPC with CC=clang CXX=clang++ CFLAGS="-g -O0" CXXFLAGS="-g -O0" seems not enough. After some tweaking, I come out the following solutions:

a) According to the Makefile:

# TODO(nnoble): the strip target is stripping in-place, instead
# of copying files in a temporary folder.
# This prevents proper debugging after running make install.  

make install” will strip the debugging information, so instead of executing “make install” command, I set LD_LIBRARY_PATH environment variable to let client link library in the specified directory:

$ export LD_LIBRARY_PATH=/home/xiaonan/Project/grpc/libs/opt

b) Hardcode -g in the Makefile:

CFLAGS += -g -std=c99 -Wsign-conversion -Wconversion $(W_SHADOW) $(W_EXTRA_SEMI)
CXXFLAGS += -g -std=c++11

Then the symbols can all be resolved:

(gdb) bt
#0  0x00007ffff6486b80 in connect () from /usr/lib/libc.so.6
#1  0x00007ffff64bb61e in open_socket () from /usr/lib/libc.so.6
#2  0x00007ffff64bbae2 in __nscd_get_mapping () from /usr/lib/libc.so.6
#3  0x00007ffff64bbed5 in __nscd_get_map_ref () from /usr/lib/libc.so.6
#4  0x00007ffff64b9ba3 in __nscd_getai () from /usr/lib/libc.so.6
#5  0x00007ffff64708bc in gaih_inet.constprop () from /usr/lib/libc.so.6
#6  0x00007ffff6471724 in getaddrinfo () from /usr/lib/libc.so.6
#7  0x00007ffff7473ec5 in blocking_resolve_address_impl (name=0x55555578edf0 "badhost:50051",
    default_port=0x555555790220 "https", addresses=0x55555578f1f0) at src/core/lib/iomgr/resolve_address_posix.c:83
#8  0x00007ffff74742e3 in do_request_thread (exec_ctx=0x7ffff5043c30, rp=0x55555578e630, error=<optimized out>)
    at src/core/lib/iomgr/resolve_address_posix.c:157
#9  0x00007ffff7472b86 in run_closures (exec_ctx=<optimized out>, list=...) at src/core/lib/iomgr/executor.c:64
#10 executor_thread (arg=0x555555789fc0) at src/core/lib/iomgr/executor.c:152
#11 0x00007ffff74e5286 in thread_body (v=<optimized out>) at src/core/lib/support/thd_posix.c:42
#12 0x00007ffff6181049 in start_thread () from /usr/lib/../lib64/libpthread.so.0
#13 0x00007ffff6485f0f in clone () from /usr/lib/libc.so.6

Now I just need to step-into code, and the information of this issue can also be referred here.

During the whole process, I used sniffer tool (tcpdump), kernel tracing tool(tcpconnect, which belongs to bcc and utilizes eBPF), networking knowledge (set breakpoint on connect function), debugging tool (gdb), and the trick of linking library (set LD_LIBRARY_PATH to bypass installing gRPC), that’s why I call the whole procedure “leverage comprehensive debugging tricks”.

 

Build static library of gRPC

Published / by nanxiao / Leave a Comment

If you want to build static library of gRPC, you can check following steps:

$ git clone https://github.com/grpc/grpc.git
$ cd grpc
$ git submodule update --init
$ CC=clang CXX=clang++ make REQUIRE_CUSTOM_LIBRARIES_opt=1 static

If you bump into following errors:

third_party/protobuf/src/google/protobuf/stubs/port.h:263:5: error: macro expansion producing 'defined' has undefined
  behavior [-Werror,-Wexpansion-to-defined]
#if GOOGLE_PROTOBUF_USE_UNALIGNED
    ^

Please check this patch. After compile, you can find all libraries are static:

$ ls -alt libs/opt/*
-rw-r--r-- 1 xiaonan xiaonan  8109188 Oct  6 17:54 libs/opt/libgrpc++_unsecure.a
-rw-r--r-- 1 xiaonan xiaonan  3705192 Oct  6 17:54 libs/opt/libgrpc++_reflection.a
-rw-r--r-- 1 xiaonan xiaonan  2089436 Oct  6 17:54 libs/opt/libgrpc++_error_details.a
-rw-r--r-- 1 xiaonan xiaonan 24675866 Oct  6 17:54 libs/opt/libgrpc++_cronet.a
-rw-r--r-- 1 xiaonan xiaonan  9276572 Oct  6 17:54 libs/opt/libgrpc++.a
-rw-r--r-- 1 xiaonan xiaonan   917148 Oct  6 17:54 libs/opt/libgrpc++_core_stats.a
-rw-r--r-- 1 xiaonan xiaonan  7180018 Oct  6 17:54 libs/opt/libgrpc_plugin_support.a
-rw-r--r-- 1 xiaonan xiaonan  8163444 Oct  6 17:41 libs/opt/libgrpc_unsecure.a
-rw-r--r-- 1 xiaonan xiaonan 18854796 Oct  6 17:41 libs/opt/libgrpc_cronet.a
-rw-r--r-- 1 xiaonan xiaonan 20064046 Oct  6 17:41 libs/opt/libgrpc.a
-rw-r--r-- 1 xiaonan xiaonan 10669618 Oct  6 17:40 libs/opt/libboringssl.a
-rw-r--r-- 1 xiaonan xiaonan   456458 Oct  6 17:39 libs/opt/libgpr.a
-rw-r--r-- 1 xiaonan xiaonan   379730 Oct  6 17:39 libs/opt/libz.a
-rw-r--r-- 1 xiaonan xiaonan   629666 Oct  6 17:39 libs/opt/libares.a

libs/opt/protobuf:
total 114748
drwxr-xr-x 4 xiaonan xiaonan     4096 Oct  6 17:54 ..
-rw-r--r-- 1 xiaonan xiaonan 50492478 Oct  6 17:49 libprotobuf.a
drwxr-xr-x 2 xiaonan xiaonan     4096 Oct  6 17:49 .
-rw-r--r-- 1 xiaonan xiaonan 66986782 Oct  6 17:49 libprotoc.a

libs/opt/pkgconfig:
total 24
drwxr-xr-x 4 xiaonan xiaonan 4096 Oct  6 17:54 ..
drwxr-xr-x 2 xiaonan xiaonan 4096 Oct  6 17:41 .
-rw-r--r-- 1 xiaonan xiaonan  283 Oct  6 17:41 grpc++_unsecure.pc
-rw-r--r-- 1 xiaonan xiaonan  253 Oct  6 17:41 grpc++.pc
-rw-r--r-- 1 xiaonan xiaonan  283 Oct  6 17:39 grpc_unsecure.pc
-rw-r--r-- 1 xiaonan xiaonan  264 Oct  6 17:39 grpc.pc

The analysis of gRPC communication mode

Published / by nanxiao / Leave a Comment

The route_guide program is a very good tutorial to learn gRPC, I will use it as an example to analyze the gRPC programming.

Prologue:
Compile the program with debug information, so it is easy to use gdb to manipulate them. Modify Makefile:

CXXFLAGS += -std=c++11 -g

(1) Start route_guide_server application, and use gdb to step-into route_guide_client:

221     int main(int argc, char** argv) {
(gdb) n
223       std::string db = routeguide::GetDbFileContent(argc, argv);
(gdb)
226                               grpc::InsecureChannelCredentials()),
(gdb)
[New Thread 0x7ffff502f700 (LWP 25947)]
[New Thread 0x7ffff482e700 (LWP 25948)]
225           grpc::CreateChannel("localhost:50051",
(gdb)
227           db);
(gdb)
DB parsed, loaded 100 features.

Check current network status:

$ netstat -an | grep 50051
tcp6       0      0 :::50051                :::*                    LISTEN

From this, we can know grpc::CreateChannel() function doesn’t create connection to server, and it just does some initialization work.

(2) Since simple RPC is one-request/one-response mode, and it’s really easy, I won’t discuss it here.

(3) Server-side streaming RPC is one-request/multiple-response mode. The client code skeleton is like this:

std::unique_ptr<ClientReader<Feature> > reader(
        stub_->ListFeatures(&context, rect));
while (reader->Read(&feature)) {
  ......
}
Status status = reader->Finish();
if (status.ok()) {
  ......
} else {
  ......
}

After client sends request (stub_->ListFeatures(&context, rect)), it will read all the responses from server until reader->Read() returns false. Then using reader->Finish() to check the status. The server code is simple, just send all responses:

for (......) {
    ......
    writer->Write(f);
 } 

But how about client not want to continue to process the responses in the middle? It likes this:

while (reader->Read(&feature)) {
  ......
  if (some error) {
    break;
  } 
}
Status status = reader->Finish();
if (status.ok()) {
  ......
} else {
  ......
}

After experimenting, the client will block at reader->Finish() function. You can refer Be careful of using grpc::ClientStreamingInterface::Finish() function. There are 2 methods I can figure out to handle this case:
a) Don’t call reader->Finish():

while (reader->Read(&feature)) {
  ......
  if (some error) {
    break;
  } 
}

b) Or although there is error, still consume other messages:

while (reader->Read(&feature)) {
  ......
  if (some error) {
    break;
  } 
}
while (reader->Read(&feature)) {
    // do nothing, just comsume other messages.
}
Status status = reader->Finish();
if (status.ok()) {
  ......
} else {
  ......
}

Honestly, I don’t dive into the gRPC code. But from testing, the above 2 methods seems all work correctly.

(4) Client-side streaming RPC is multiple-request/one-response mode, the client code framework is like this:

std::unique_ptr<ClientWriter<Point> > writer(
    stub_->RecordRoute(&context, &stats));
for (......) {
  ......
  if (!writer->Write(f.location())) {
    // Broken stream.
    break;
  }
  ......
}
writer->WritesDone();
Status status = writer->Finish();
if (status.ok()) {
  // Check stats from server
} else {
  ......
}

The client code is simple, just tries best to send all requests. Once finish transmitting, calls writer->WritesDone() and blocks writer->Finish() to wait server’s response. For server code, it’s also brief:

while (reader->Read(&point)) {
  ......
}

If the program breaks the while loop half way:

while (reader->Read(&point)) {
  ......
  if (some error) {
    break;
  } 
}

That is OK too.

(5) The last flow is bidirectional streaming RPC, which is multiple-request/multiple-response mode. For the server code framework, it is neat, just sends response according request:

while (stream->Read(&note)) {
  ......
  stream->Write(n);
  ......
}

But for client side, it is a little complicated:

stream->Write(note);
stream->Read(&server_note);
stream->Write(note);
stream->Read(&server_note);
......
stream->WritesDone();
Status status = stream->Finish();
if (!status.ok()) {
  ......
} else {
  ......
}

You must pay attention to the same problem as Server-side streaming RPC, that is, if there is pending server’s responses before stream->Finish(), it will cause client block forever:

while (reader->Read(&feature)) {
  ......
  if (some error) {
    break;
  } 
}
Status status = reader->Finish();

Epilogue:
In a summary, gRPC is a brilliant communication framework which hides so many dirty details and let you focus on message definitions. If you are considering doing some network programming, you can try it, and I think it can’t make you disappointed.

Be careful of using grpc::ClientStreamingInterface::Finish() function

Published / by nanxiao / Leave a Comment

No matter server-side streaming RPC or bidirectional streaming RPC, if there is still message for Client to read, but Client callsgrpc::ClientStreamingInterface::Finish(), it will block here forever. The call stack is like this:

#0  0x00007ffff5b24076 in epoll_pwait () from /usr/lib/libc.so.6
#1  0x00007ffff71c6ae5 in ?? () from /usr/lib/libgrpc.so.4
#2  0x00007ffff71e3a26 in ?? () from /usr/lib/libgrpc.so.4
#3  0x000055555558300f in grpc::CompletionQueue::Pluck (this=0x555555fd5340, tag=0x7fffffffd800)
    at /usr/include/grpc++/impl/codegen/completion_queue.h:224
#4  0x0000555555585e46 in grpc::ClientReaderWriter<..., ...>::Finish (
    this=0x555555fd5320) at /usr/include/grpc++/impl/codegen/sync_stream.h:494

So please pay attention to it. Before calling grpc::ClientStreamingInterface::Finish(), make sure the grpc::ReaderInterface::Read() return false:

while (stream->Read(&server_note)) {
  ...
}
Status status = stream->Finish();

First taste of gRPC

Published / by nanxiao / Leave a Comment

In the past month, I made my hand dirty on gRPC, and tried to use it in my current project. The effect seems good until now, I want to share some feeling here:

(1) gRPC lets you concentrate on message definition, and it will generate all necessary code for you. This is really very neat! The APIs are also simple and handy: they help process all kinds of exceptional cases which will save users a lot of energy.

(2) Now gRPC‘s maximum message length is INT_MAX (please refer this post) , so on general 64-bit platform, its value is 2^31 - 1, nearly 2GiB. Personally, I think maybe LONG_MAX, 2^63 - 1, is better, since it is nearly equal to say there is no limitation for message length. BTW, gRPC useProtocol Buffers under the hood. From Protocol Buffers document:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

I haven’t dived into Protocol Buffers code, so not sure what is the negative effect of transmitting large message. But anyway, whether support and whether support well are 2 different things.