Preface Many Java programmers may only have a three-way handshake or four-way handshake understanding of TCP. I think the main reason for this is that the TCP protocol itself is a little abstract (compared to the HTTP protocol at the application layer); secondly, non-framework developers do not need to be exposed to some details of TCP. In fact, I personally do not fully understand many details of TCP. This article mainly aims to make a unified summary of the long connection and heartbeat issues raised by some people in the WeChat communication group.
In Java, when using TCP communication, Socket and Netty are likely to be involved. This article will use some of their APIs and setting parameters to assist in the introduction. Long and short connections TCP itself does not distinguish between long and short connections. Whether it is long or short depends entirely on how we use it.
The advantages of short connections and long connections are the disadvantages of each other. If you want to keep things simple and not pursue high performance, it is appropriate to use short connections, so that we don't need to worry about the management of the connection status; if you want to pursue performance and use long connections, we need to worry about various issues: such as end-to-end connection maintenance and connection keepalive. Persistent connections are also often used to push data. Most of the time, we still think of communication as a request/response model, but the nature of TCP duplex communication determines that it can also be used for two-way communication. Under persistent connections, the push model can be easily implemented. There is not much to say about short connections, so below we will focus on some issues about long connections. Purely talking about theory is a bit too monotonous, so below I will use some practices of the RPC framework Dubbo to discuss TCP. Long Connections in the Service Governance Framework As mentioned above, when pursuing performance, you will inevitably choose to use a long connection, so Dubbo can help you understand TCP well. We start two Dubbo applications, a server responsible for listening to the local port 20880 (as we all know, this is the default port of the Dubbo protocol), and a client responsible for sending requests in a loop. Execute the lsof -i:20880 command to view the relevant usage of the port: image-20190106203341694
Maintenance of long connections Because the services requested by the client may be distributed on multiple servers, the client naturally needs to create multiple persistent connections with the other end. When using persistent connections, the first problem we encounter is how to maintain persistent connections.
In Dubbo, both the client and the server use ip:port to maintain end-to-end persistent connections, and Channel is an abstraction of the connection. We mainly focus on the persistent connection in NettyHandler. The server also maintains a collection of persistent connections, which is a design of Dubbo, and we will mention it later. Connection keepalive This topic is worth discussing, and it involves many knowledge points. First of all, we need to make it clear why we need to report the connection alive? When the two parties have established a connection, but the link is not connected due to network problems, the long connection cannot be used. It should be made clear that it is not a very reliable thing to check the connection status in the ESTABLISHED state through commands such as netstat and lsof, because the connection may be dead, but it is not perceived by the system, not to mention the difficult problem of false death. Ensuring the availability of long connections is a technical job. KeepAlive The first thing that comes to mind is the KeepAlive mechanism in TCP. KeepAlive is not part of the TCP protocol, but most operating systems implement this mechanism. After the KeepAlive mechanism is turned on, if there is no data transmission on the link within a certain period of time (usually 7200s, parameter tcp_keepalive_time), the TCP layer will send the corresponding KeepAlive probe to determine the connection availability. After the probe fails, it will retry 10 times (parameter tcp_keepalive_probes), each time interval is 75s (parameter tcp_keepalive_intvl). After all probes fail, the current connection is considered unavailable. Enable KeepAlive in Netty: bootstrap.option(ChannelOption.SO_KEEPALIVE, true) To set KeepAlive related parameters in the Linux operating system, modify the /etc/sysctl.conf file:
The KeepAlive mechanism ensures the availability of the connection at the network level, but from the application framework level, we believe that this is not enough. This is mainly reflected in two aspects:
It seems that keeping the connection alive at the application level is still necessary. Connection keepalive: application layer heartbeat Finally, the topic is here. The heartbeat mentioned in the title is another TCP-related knowledge point that this article wants to emphasize. In the previous section, we have explained that KeepAlive at the network level is not enough to support the connection availability at the application level. In this section, let's talk about the heartbeat mechanism at the application layer to achieve connection keepalive. How to understand the heartbeat at the application layer? Simply put, the client will start a scheduled task and send a request to the peer application that has established a connection (the request here is a special heartbeat request), and the server needs to handle the request specially and return a response. If the heartbeat continues for many times without receiving a response, the client will think that the connection is unavailable and actively disconnect. Different service governance frameworks have different strategies for heartbeats, connection establishment, disconnection, and blacklisting mechanisms, but most service governance frameworks will perform heartbeats at the application layer, and Dubbo is no exception. Design details of application layer heartbeat Take Dubbo as an example. It supports the heartbeat of the application layer. Both the client and the server will start a HeartBeatTask. The client starts it in HeaderExchangeClient, and the server starts it in HeaderExchangeServer. The beginning of the article buried a pit: Why does Dubbo maintain a Map on the server at the same time? It is mainly to contribute to the heartbeat. When the heartbeat timer task finds that the connection is unavailable, it will take different branches according to whether it is the client or the server. If the client finds that it is unavailable, it will reconnect; if the server finds that it is unavailable, it will directly close it.
Students who are familiar with other RPC frameworks will find that the heartbeat mechanisms of different frameworks are really very different. The heartbeat design is also related to connection creation, reconnection mechanism, and blacklist connection, and specific analysis of specific frameworks is required. In addition to the design of scheduled tasks, heartbeat support is also required at the protocol level. The simplest example can be referred to the health check of nginx. For the Dubbo protocol, heartbeat support is also required. If the heartbeat request is identified as normal traffic, it will cause pressure problems on the server, interfere with current limiting and many other problems. dubbo protocol Flag represents the flag bit of the Dubbo protocol, which has 8 address bits in total. The lower four bits are used to indicate the type of serialization tool used for the message body data (the default is Hessian). Among the upper four bits, the first bit is 1 for request, the second bit is 1 for bidirectional transmission (i.e. there is a returned response), and the third bit is 1 for heartbeat event. Heartbeat requests should be treated differently from normal requests. Note that it is different from HTTP's KeepAlive.
These are two completely different concepts. KeepAlive Common Errors Applications that enable TCP KeepAlive can generally capture the following types of errors
Summarize There are three practical scenarios for using KeepAlive: By default, the KeepAlive cycle is 2 hours. If you do not choose to change it, it is a misuse and causes resource waste: the kernel will open a keepalive timer for each connection, and N connections will open N keepalive timers. The advantages are obvious:
The designs of each framework are different. For example, Dubbo uses solution 3, but the HSF framework within Alibaba does not set TCP KeepAlive, and only uses the application heartbeat to keep alive. Like the heartbeat strategy, this is related to the overall design of the framework. |
<<: How do weak-current system devices in different network segments communicate with each other?
With the accelerated construction of new infrastr...
"Are you ready for the interview?" &quo...
The introduction of the new infrastructure strate...
CloudCone is a sub-brand of Quadcone. It was foun...
Have you ever found yourself always experiencing ...
Since it is the virtualization of the host networ...
At present, countries around the world are accele...
In computer network communications, TCP (Transmis...
Usually, you may encounter such a phenomenon duri...
The Linkerd 2.10 Chinese manual is being continuo...
With the advent of the Internet of Things era, th...
"Holographic telepresence is expected to bec...
Over the past 100 years, human beings have develo...
When many businesses first installed wireless IoT...
We have seen it in the parameter configurations o...