I'm stunned! Why is the latency so high for a simple HTTP call?

Recently, a strange phenomenon occurred during project testing. In the test environment, the average time consumed when calling the backend HTTP service through Apache HTTP Client was close to 39.2ms.

[[278773]]

Image via Pexels

At first glance you may think this is normal and what's so strange about it? Actually, it is not. Let me tell you some basic information.

The backend HTTP service does not have any business logic. It just converts a string into uppercase and then returns it. The string length is only 100 characters. In addition, the network Ping delay is only about 1.9ms.

Therefore, theoretically, the call should take about 2-3ms, but why does it take an average of 39.2ms?

Call delay

Ping Latency

Due to work reasons, the problem of time-consuming calls is no longer strange to me. I often help businesses solve problems related to internal RPC framework call timeouts, but this is the first time I have encountered the problem of HTTP call time-consuming.

However, the routine for troubleshooting is the same. The main methodologies are nothing more than outside-in, top-down, etc. Let's first look at some external indicators to see if we can find any clues.

Peripheral indicators

System indicators

Mainly look at some peripheral system indicators (note: both the calling and called machines should be looked at), such as load and CPU. You can get a clear view of them with just one Top command.

Therefore, I confirmed that the CPU and load were both idle. Since I didn't take a screenshot at the time, I won't post it here.

Progress Indicators

The Java program process indicators mainly depend on the GC and thread stack conditions (note: both the calling and called machines should be considered).

Young GC is very rare and takes less than 10ms, so there is no long STW.

Because the average call time is 39.2ms, which is quite long, if the time consumption is caused by the code, the thread stack should be able to reveal something.

After looking at it, I found nothing. The main performance of the service-related thread stack is that the thread pool thread is waiting for tasks, which means that the thread is not busy.

Do you feel like you have run out of tricks? What should you do next?

Local reproduction

If the problem can be reproduced locally (the local system is a MAC system), it is also very helpful for troubleshooting.

Therefore, I wrote a simple test program locally using Apache HTTP Client to directly call the backend HTTP service and found that the average time consumed was around 55ms.

Why is it a little different from the result of 39.2ms in the test environment? The main reason is that the local and test environment backend HTTP service machines are in different regions, and the Ping delay is about 26ms, so the delay is increased.

However, there are indeed problems locally, because the Ping delay is 26ms, and the backend HTTP service logic is simple and takes almost no time, so the average local call time should be around 26ms. Why is it 55ms?

Are you getting more and more confused and at a loss as to where to start? During this period, you may have suspected that there was something wrong with the Apache HTTP Client.

Therefore, I wrote a simple program using the HttpURLConnection that comes with JDK and did a test, and the results were the same.

diagnosis

position

In fact, judging from the external system indicators, process indicators, and local reproduction, it can be roughly determined that it is not a program problem. What about the TCP protocol level?

Students who have experience in network programming must know what TCP parameter will cause this phenomenon. Yes, you guessed it right, it is TCP_NODELAY.

Which program on the caller or the called party has no settings? The caller uses Apache Http Client, and the default setting of tcpNoDelay is True.

Let's take a look at the callee, which is our backend HTTP service. This HTTP service uses the HttpServer that comes with JDK:

 HttpServer server = HttpServer. create (new InetSocketAddress(config.getPort()), BACKLOGS);

I didn't see the direct setting of tcpNoDelay interface, so I looked through the source code. Oh, it turns out to be here.

In the ServerConfig class, there is a static block that is used to get the startup parameters. By default, ServerConfig.noDelay is false:

 static {
    AccessController.doPrivileged(new PrivilegedAction<Void>() {
 public Void run() {
            ServerConfig.idleInterval = Long.getLong( "sun.net.httpserver.idleInterval" , 30L) * 1000L;
            ServerConfig.clockTick = Integer .getInteger( "sun.net.httpserver.clockTick" , 10000);
            ServerConfig.maxIdleConnections = Integer .getInteger( "sun.net.httpserver.maxIdleConnections" , 200);
            ServerConfig.drainAmount = Long.getLong( "sun.net.httpserver.drainAmount" , 65536L);
            ServerConfig.maxReqHeaders = Integer .getInteger( "sun.net.httpserver.maxReqHeaders" , 200);
            ServerConfig.maxReqTime = Long.getLong( "sun.net.httpserver.maxReqTime" , -1L);
            ServerConfig.maxRspTime = Long.getLong( "sun.net.httpserver.maxRspTime" , -1L);
            ServerConfig.timerMillis = Long.getLong( "sun.net.httpserver.timerMillis" , 1000L);
            ServerConfig.debug = Boolean.getBoolean( "sun.net.httpserver.debug" );
            ServerConfig.noDelay = Boolean.getBoolean( "sun.net.httpserver.nodelay" );
 return   null ;
 }
 });
 }

verify

In the backend HTTP service, add the startup parameter "-Dsun.net.httpserver.nodelay=true" and try again.

The effect is obvious, the average time is reduced from 39.2ms to 2.8ms:

Optimized call latency

The problem has been solved, but if you stop here, it would be too cheap for this case and a waste of resources.

Because there are still a lot of doubts waiting for you:

Why does the latency drop from 39.2ms to 2.8ms after adding TCP_NODELAY?
Why is the average latency of the local test 55ms instead of the 26ms of the Ping latency?
How does the TCP protocol send data packets?

Come on, let’s strike while the iron is hot.

Questions and Answers

①Who is TCP_NODELAY?

In Socket programming, the TCP_NODELAY option is used to control whether to enable the Nagle algorithm.

In Java, True means turning off the Nagle algorithm, and False means turning on the Nagle algorithm. You must be asking what the Nagle algorithm is?

②What is Nagle's algorithm?

Nagle's algorithm is a method for improving the efficiency of TCP/IP networks by reducing the number of packets sent across the network.

It is named after its inventor, John Nagle, who first used the algorithm in 1984 to try to solve network congestion problems at Ford Motor Company.

Imagine if the application generates 1 byte of data each time, and then sends this 1 byte of data to the remote server in the form of a network data packet, it will easily cause the network to be overloaded due to too many data packets.

In this typical case, transmitting a data packet with only 1 byte of valid data requires an additional overhead of a 40-byte header (ie, 20 bytes of IP header + 20 bytes of TCP header), and the utilization rate of this payload is extremely low.

The content of Nagle's algorithm is relatively simple. The following is the pseudo code:

 if there is new data to send
  if the window size >= MSS and available data is >= MSS
    send complete MSS segment now
 else  
    if there is unconfirmed data still in the pipe
      enqueue data in the buffer until an acknowledge is received
 else  
 send data immediately
 end if
 end if
 end if

The specific approach is:

If the content to be sent is greater than or equal to 1 MSS, it will be sent immediately.
If there are no previous packets that have not been ACKed, send immediately.
If there is a packet that has not been ACKed before, cache the sent content.
If ACK is received, the buffered content is sent immediately. (MSS is the maximum data segment that can be transmitted in a TCP data packet at a time)

③What is Delayed ACK?

As we all know, in order to ensure the reliability of transmission, the TCP protocol stipulates that a confirmation needs to be sent to the other party when a data packet is received.

Simply sending an acknowledgment will be more expensive (20 bytes for the IP header + 20 bytes for the TCP header).

TCP Delayed ACK (delayed confirmation) is designed to solve this problem in an effort to improve network performance.

It combines several ACK response groups into a single response, or sends the ACK response together with the response data to the other party, thereby reducing protocol overhead.

The specific approach is:

When there is response data to be sent, ACK will be sent to the other party immediately along with the response data.
If there is no response data, ACK will be delayed to wait and see if there is response data to send together. In Linux system, the default delay time is 40ms.
If the other party's second data packet arrives while waiting to send ACK, ACK should be sent immediately.

However, if three data packets from the other party arrive one after another, whether to send ACK immediately when the third data segment arrives depends on the above two items.

④What chemical reaction will happen when Nagle and Delayed ACK are combined?

Both Nagle and Delayed ACK can improve the efficiency of network transmission, but using them together can have the opposite effect.

For example, in the following scenario, A and B are transmitting data: A runs the Nagle algorithm and B runs the Delayed ACK algorithm.

If A sends a data packet to B, B will not respond immediately due to Delayed ACK. If A uses the Nagle algorithm, A will keep waiting for B's ACK and will not send the second data packet until ACK comes. If these two data packets are for the same request, the request will be delayed by 40ms.

⑤ Grab a bag and have some fun

Let's capture a packet to verify it. Execute the following script on the backend HTTP service to easily complete the packet capture process.

 sudo tcpdump -i eth0 tcp and host 10.48.159.165 -s 0 -w traffic.pcap

As shown in the figure below, this is a display of using Wireshark to analyze the packet content. The red box is a complete POST request processing process.

Test environment data packet analysis

The difference between sequence number 130 and sequence number 149 is 40ms (0.1859 - 0.1448 = 0.0411s = 41ms). This is the chemical reaction of Nagle and Delayed ACK sent together.

Among them, 10.48.159.165 runs Delayed ACK, and 10.22.29.180 runs the Nagle algorithm.

10.22.29.180 is waiting for ACK, and 10.48.159.165 triggers Delayed ACK, so it waits for 40ms.

This also explains why the test environment takes 39.2ms, because most of it is delayed by the 40ms of Delayed ACK.

But when reproducing locally, why is the average latency of the local test 55ms instead of the Ping latency of 26ms? Let's capture a packet as well.

As shown in the figure below, the red box shows a complete POST request processing process. The difference between sequence number 8 and sequence number 9 is about 25ms, which is about half of the Ping delay minus the network delay, 13ms.

Local environment data packet analysis

Therefore, the Delayed Ack is about 12ms (due to the local MAC system and Linux, there are some differences).

 Linux uses the system configuration /proc/sys/net/ipv4/tcp_delack_min to control the Delayed ACK time. The default setting in Linux is 40ms.
 2. MAC controls Delayed ACK through the net.inet.tcp.delayed_ack system configuration.
  delayed_ack=0 responds after every packet ( OFF )
  delayed_ack=1 always employs delayed ack, 6 packets can get 1 ack
  delayed_ack=2 immediate ack after 2nd packet, 2 packets per ack (Compatibility Mode)
  delayed_ack=3 should auto detect when   to employ delayed ack, 4packets per ack. ( DEFAULT )
 Setting it to 0 means disabling delayed ACK, setting it to 1 means always delaying ACK, setting it to 2 means replying an ACK for every two data packets, and setting it to 3 means the system automatically detects the timing of replying ACK.

⑥Why can TCP_NODELAY solve the problem?

tcpNoDelay disables the Nagle algorithm. Even if the ACK of the previous data packet has not arrived, the next data packet will be sent, thus breaking the effect of Delayed ACK.

Generally in network programming, it is strongly recommended to enable tcpNoDelay to improve response speed.

Of course, you can also solve the problem by configuring the Delayed ACK related system, but since it is inconvenient to modify the machine configuration, this method is not recommended.

Summarize

This article is a troubleshooting process for a simple HTTP call with a large latency. First, the related issues are analyzed from the outside to the inside, then the issues are located and the solutions are verified.

Finally, we gave a comprehensive explanation of Nagle and Delayed ACK in TCP transmission and analyzed the problem case more thoroughly.

<<: An article that explains the HTTP protocol in Dubbo in detail

>>: 7 key features of 5G mobile phones

PacificRack removed old packages and unilaterally raised renewal prices, offering 50% discount on new packages for the first month

Blog

5G commercial use will be completed in one year, and the survey found that most consumers are still waiting and watching

I'm stunned! Why is the latency so high for a simple HTTP call?

PacificRack removed old packages and unilaterally raised renewal prices, offering 50% discount on new packages for the first month

5G commercial use will be completed in one year, and the survey found that most consumers are still waiting and watching

SoftShellWeb: Netherlands/San Jose VPS 10% off first month starting at $0.5

What is Wi-Fi and why is it so important?

5G becomes a strong driving force for edge computing

How can operators gain a foothold in the 5G terminal market?

Why ICO made a mistake and blockchain will still lead the era

Edge cloud and 5G will impact the next era of networking

"Innovation City" shines brightly and opens up a new ecosystem for Ascend

Tragicservers: $7/year OpenVZ-128MB/10GB/500GB/Los Angeles

Recommend

Communication Protocol I2C Subsystem I2C Driver

Yike Cloud: 20% off on all VPS hosts, US CN2 GIA/Hong Kong CN2/CN2 High Defense lines available

Three things to consider before building a data center

Some thoughts on the information construction of the financial industry

Huawei Cloud China Tour in Xi'an invites you to join us and discuss how to collaborate and innovate in the cloud era

Is the 5G era really here? Let’s solve the dilemma of 5G spending too much and earning too little first

Comprehensive popular science about "Internet of Vehicles"!

Communication protocol I2C subsystem Debug

Three key reasons why automakers are adopting 5G

Analysis of 5G network security issues

Lenovo Debuts at Microsoft IoT Conference, Driving Business Intelligence Innovation with Smart IoT Devices

IPv6 just increases the number of addresses? In fact, the truth is not that simple!

Industry Observation: 6G will mainly become an industrial IoT network

Cyber Security Awareness Week丨Ruishu Information explains how to ensure data security?

The three major operators had a good start in 2021: China's 5G has crossed the inflection point of value growth