Be careful of using grpc::ClientStreamingInterface::Finish() function

No matter server-side streaming RPC or bidirectional streaming RPC, if there is still message for Client to read, but Client callsgrpc::ClientStreamingInterface::Finish(), it will block here forever. The call stack is like this:

#0  0x00007ffff5b24076 in epoll_pwait () from /usr/lib/libc.so.6
#1  0x00007ffff71c6ae5 in ?? () from /usr/lib/libgrpc.so.4
#2  0x00007ffff71e3a26 in ?? () from /usr/lib/libgrpc.so.4
#3  0x000055555558300f in grpc::CompletionQueue::Pluck (this=0x555555fd5340, tag=0x7fffffffd800)
    at /usr/include/grpc++/impl/codegen/completion_queue.h:224
#4  0x0000555555585e46 in grpc::ClientReaderWriter<..., ...>::Finish (
    this=0x555555fd5320) at /usr/include/grpc++/impl/codegen/sync_stream.h:494

So please pay attention to it. Before calling grpc::ClientStreamingInterface::Finish(), make sure the grpc::ReaderInterface::Read() return false:

while (stream->Read(&server_note)) {
  ...
}
Status status = stream->Finish();

First taste of gRPC

In the past month, I made my hand dirty on gRPC, and tried to use it in my current project. The effect seems good until now, I want to share some feeling here:

(1) gRPC lets you concentrate on message definition, and it will generate all necessary code for you. This is really very neat! The APIs are also simple and handy: they help process all kinds of exceptional cases which will save users a lot of energy.

(2) Now gRPC‘s maximum message length is INT_MAX (please refer this post) , so on general 64-bit platform, its value is 2^31 - 1, nearly 2GiB. Personally, I think maybe LONG_MAX, 2^63 - 1, is better, since it is nearly equal to say there is no limitation for message length. BTW, gRPC useProtocol Buffers under the hood. From Protocol Buffers document:

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

I haven’t dived into Protocol Buffers code, so not sure what is the negative effect of transmitting large message. But anyway, whether support and whether support well are 2 different things.

 

The basics of Client/Server socket programming

While Client/Server communication model is ubiquitous nowadays, most of them involve socket programming knowledge. In this post, I will introduce some rudimentary aspects of it:

(1) Short/Long-lived TCP connection.
Short-lived TCP connection refers to following pattern: Client creates a connection to server; send message, then close the connection. If Client wants to transmit information again, repeat the above steps. Because establishing and destroying TCP sessions have overhead, if Client needs to have transactions with Server frequently, long-lived connection may be a better choice: connect Server; deliver message, deliver message, …, disconnect Server. A caveat about long-lived connection is Client may need to send heartbeat message to Server to keep the TCP session active.

(2) Synchronous/Asynchronous communication.
After Client sends the request, it can block here to wait for Server’s response, this is called synchronous mode. Certainly the Client should set a timer in case the response never come. The Client can also choose not to block, and continue to do other things. On the contrary, this is called asynchronous mode. If the Client can send multiple requests before receiving responses, it needs to add ID for every message, so it can distinguish the corresponding response for every request.

(3) Error handling.
A big pain point of socket programming is you need to consider so many exceptional cases. For example, the Server suddenly crashes; the network cable is plugged out, or the response message is half-received, etc. So your code should process as many abnormalities as possible. It is no exaggeration to say that error-handling code quality is the cornerstone of program’s robustness.

(4) Portability.
Different *NIX flavors may have small divergences on socket programming, so the program works well on Linux may not guarantee it also run as you expect on FreeBSD. BTW, I summarized a post about tips of Solaris/illumos socket programming before, and you can read it if you happen to work on these platforms.

(5) Leverage sniffer tools.
Tcpdump/Wireshark/snoop are amazing tools for debugging network programming, and they can tell you what really happens under the hood. Try to be sophisticated at these tools, and they will save you at one day, trust me!

 

A brief intro of TCP keep-alive in Go’s HTTP implementation

Let’s see a Go web program:

package main

import (
        "fmt"
        "io"
        "io/ioutil"
        "net/http"
        "os"
        "time"
)

func main() {
        for {
                resp, err := http.Get("https://www.google.com/")
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
                        os.Exit(1)
                }
                _, err = io.Copy(ioutil.Discard, resp.Body)
                resp.Body.Close()
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: reading: %v\n", err)
                        os.Exit(1)
                }
                time.Sleep(45 * time.Second)
        }
}

The logic of above application is not hard, just retrieve information from the specified Website URL. I am a layman of web development, and think the HTTP communication should be “short-lived”, which means when HTTPclient issues a request, it will start a TCP connection with HTTP server, once the client receives the response, this session should end, and the TCP connection should be demolished. But is the fact really like this? I use lsof command to check my guess:

# lsof -P -n -p 907
......
fetch   907 root    3u    IPv4 0xfffff80013677810              0t0     TCP 192.168.80.129:32618->xxx.xxx.xxx.xxx:8080 (ESTABLISHED)
......

Oh! My assumption is wrong, and there is a “long-lived” TCP connection. Why does this happen? When I come across the network related troubles, I will always seek help from tcpdump and wireshark and try to capture packets for analysis. This time should not be exception, and the communication flow is as the following picture:

1

(1) The start number of packet is 4, that’s because the first 3 packets are TCP handshake, and it is safe to ignore them;
(2) Packet 4 ~ 43 is the first HTTP GET flow, and this process lasts about 2 seconds, and ends at 19:20:37;
(3) Then after half a minute, at 19:21:07, there occurs a TCP keep-alive packet on the wire. Oh dear! The root cause has been found! Although the HTTP session is over, the TCP connection still exists and uses keep-alive mechanism to make the TCP passway alive, so this TCP route can be reused by following HTTP messages;
(4) As expected, 15 seconds later, a new HTTP session begins at packet 46, which is exactly 45 seconds pass after the first HTTPconversation.

That’s the effect of TCP keep-alive, which keeps the TCP connection alive and can be reused by following HTTP sessions. Another thing you should pay attention of reusing connection is the Response.Body must be read to completion and closed (Please refer here):

type Response struct {
    ......

    // Body represents the response body.
    //
    // The http Client and Transport guarantee that Body is always
    // non-nil, even on responses without a body or responses with
    // a zero-length body. It is the caller's responsibility to
    // close Body. The default HTTP client's Transport does not
    // attempt to reuse HTTP/1.0 or HTTP/1.1 TCP connections
    // ("keep-alive") unless the Body is read to completion and is
    // closed.
    //
    // The Body is automatically dechunked if the server replied
    // with a "chunked" Transfer-Encoding.
    Body io.ReadCloser

    ......
}

As an example, modify the above program as follows:

package main

import (
        "fmt"
        "io"
        "io/ioutil"
        "net/http"
        "os"
        "time"
)

func closeResp(resp *http.Response) {
        _, err := io.Copy(ioutil.Discard, resp.Body)
        resp.Body.Close()
        if err != nil {
                fmt.Fprintf(os.Stderr, "fetch: reading: %v\n", err)
                os.Exit(1)
        }
}

func main() {
        for {
                resp1, err := http.Get("https://www.google.com/")
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
                        os.Exit(1)
                }

                resp2, err := http.Get("https://www.facebook.com/")
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
                        os.Exit(1)
                }

                time.Sleep(45 * time.Second)
                for _, v := range []*http.Response{resp1, resp2} {
                        closeResp(v)
                }
        }
}

During running it, You will see 2 TCP connections, not 1:

# lsof -P -n -p 1982
......
fetch   1982 root    3u    IPv4 0xfffff80013677810              0t0     TCP 192.168.80.129:43793->xxx.xxx.xxx.xxx:8080 (ESTABLISHED)
......
fetch   1982 root    6u    IPv4 0xfffff80013677000              0t0     TCP 192.168.80.129:12105->xxx.xxx.xxx.xxx:8080 (ESTABLISHED)

If you call closeResp function before issuing new HTTP request, the TCP connection can be reused:

package main

import (
        "fmt"
        "io"
        "io/ioutil"
        "net/http"
        "os"
        "time"
)

func closeResp(resp *http.Response) {
        _, err := io.Copy(ioutil.Discard, resp.Body)
        resp.Body.Close()
        if err != nil {
                fmt.Fprintf(os.Stderr, "fetch: reading: %v\n", err)
                os.Exit(1)
        }
}

func main() {
        for {
                resp1, err := http.Get("https://www.google.com/")
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
                        os.Exit(1)
                }
                closeResp(resp1)


                time.Sleep(45 * time.Second)

                resp2, err := http.Get("https://www.facebook.com/")
                if err != nil {
                        fmt.Fprintf(os.Stderr, "fetch: %v\n", err)
                        os.Exit(1)
                }
                closeResp(resp2)
        }
}

P.S., the full code is here.

References:
Package http;
Reusing http connections in Golang;
Is HTTP Shortlived?.

Socket programming tips on Solaris

I sponsored a topic in stackoverflow.com, and hoped the programmers can share the socket programming tips on different UNIX flavors. But unfortunately, the responders are few. So I can only share my socket programming tips on Solaris at here (the Chinese version can be found there):

  1. Use the following link options: “-lresolv -lnsl -lsocket”;
  2. Solaris doesn’t provide socket options: SO_SNDTIMEO and SO_RCVTIMEO(Why does Solaris OS define SO_SNDTIMEO and SO_RCVTIMEO socket options in header file which actually not support by kernel?);
  3. In SCTP programming, must call bind() before calling sctp_bindx()(sctp_bindx(Solaris sctp library) always return “Invalid argument”);
  4. When calling shutdown() on a listen socket, it will cause ENOTCONN error(Why shutdown a socket can’t let the select() return?).