Why is the latency so high for a simple HTTP call? Let’s capture a packet and analyze it

Why is the latency so high for a simple HTTP call? Let’s capture a packet and analyze it

1. Recently, a strange phenomenon occurred in the project test. In the test environment, the average time taken to call the backend HTTP service through Apache HttpClient was close to 39.2ms. You may think that this is normal at first glance. What's so strange about it? In fact, it is not. Let me tell you some basic information. The backend HTTP service does not have any business logic. It just converts a string into uppercase and then returns it. The string length is only 100 characters. In addition, the network ping delay is only about 1.9ms. Therefore, theoretically, the call should take about 2-3ms, but why does it take 39.2ms on average?

Due to work reasons, the problem of time-consuming calls is not surprising to me. I often help businesses solve problems related to internal RPC framework call timeouts, but I encounter HTTP call time-consuming for the first time. However, the routine for troubleshooting is the same. The main methodologies are nothing more than outside-in, top-down and other troubleshooting methods. Let's first look at some peripheral indicators to see if we can find any clues.

2. Peripheral indicators

2.1 System Indicators

Mainly look at some peripheral system indicators (note: both the calling and called machines should be looked at), such as load and CPU. You can get a clear view of them with just one top command.

Therefore, I confirmed that the CPU and load were both idle. Since I didn't take a screenshot at the time, I won't post it here.

2.2 Process Indicators

The Java program process indicators mainly depend on the GC and thread stack conditions (note: both the calling and called machines should be considered).

Young GC is very rare and takes less than 10ms, so there is no long STW.

Because the average call time is 39.2ms, which is quite long, if the time consumption is caused by the code, the thread stack should be able to find something. After looking at it, I found nothing. The main performance of the service-related thread stack is that the thread pool thread is waiting for tasks, which means that the thread is not busy.

Do you feel like you have run out of tricks? What should you do next?

3. Local reproduction

If the problem can be reproduced locally (the local system is a MAC system), it is also very helpful for troubleshooting.

Therefore, I wrote a simple test program using Apache HttpClient locally, and directly called the backend HTTP service. I found that the average time consumption was about 55ms. Hey, why is it a little different from the result of 39.2ms in the test environment? The main reason is that the backend HTTP service machines of the local and test environments are in different regions, and the ping delay is about 26ms, so the delay is increased. However, there are indeed problems locally, because the ping delay is 26ms, and the backend HTTP service logic is simple and almost time-consuming, so the average time consumption of local calls should be about 26ms, why is it 55ms?

Are you getting more and more confused, bewildered, and don’t know where to start?

During this period, I suspected that there was something wrong with the use of Apache HttpClient, so I wrote a simple program using the HttpURLConnection that comes with JDK and did a test, and the results were the same.

4. Diagnosis

4.1 Positioning

In fact, judging from the external system indicators, process indicators, and local reproduction, it can be roughly determined that it is not a program problem. What about the TCP protocol level?

Students who have experience in network programming must know what TCP parameter will cause this phenomenon. Yes, you guessed it right, it is TCP_NODELAY.

So which program on the caller or the called party has no settings?

The caller uses Apache HttpClient, and tcpNoDelay is set to true by default. Let's take a look at the callee, which is our backend HTTP service. This HTTP service uses the HttpServer that comes with JDK.

  1. HttpServer server = HttpServer. create (new InetSocketAddress(config.getPort()), BACKLOGS);

I didn't see the direct setting of tcpNoDelay interface, so I looked through the source code. Oh, it turns out that there is a static block in the ServerConfig class, which is used to get the startup parameters, and the default ServerConfig.noDelay is false.

  1. static  
  2. {
  3. AccessController.doPrivileged(newPrivilegedAction<Void>() {
  4. public Void run() {
  5. ServerConfig.idleInterval = Long.getLong( "sun.net.httpserver.idleInterval" , 30L) *1000L;
  6. ServerConfig.clockTick = Integer .getInteger( "sun.net.httpserver.clockTick" , 10000);
  7. ServerConfig.maxIdleConnections = Integer .getInteger("sun.net.httpserver.
  8. maxIdleConnections", 200);
  9. ServerConfig.drainAmount = Long.getLong( "sun.net.httpserver.drainAmount" , 65536L);
  10. ServerConfig.maxReqHeaders = Integer .getInteger("sun.net.httpserver.
  11. maxReqHeaders", 200);
  12. ServerConfig.maxReqTime = Long.getLong( "sun.net.httpserver.maxReqTime" , -1L)
  13. ServerConfig.maxRspTime = Long.getLong( "sun.net.httpserver.maxRspTime" , -
  14. 1L);
  15. ServerConfig.timerMillis = Long.getLong( "sun.net.httpserver.timerMillis" , 1000L);
  16. ServerConfig.debug = Boolean.getBoolean( "sun.net.httpserver.debug" );
  17. ServerConfig.noDelay = Boolean.getBoolean( "sun.net.httpserver.nodelay" );
  18. return   null ;
  19. }
  20. });
  21. }

4.2 Verification

In the backend HTTP service, add the "-Dsun.net.httpserver.nodelay=true" parameter to start, and try again. The effect is obvious, the average time consumption is reduced from 39.2ms to 2.8ms.


The problem has been solved, but if you stop here, it would be a waste of this case. Because there are still a lot of doubts waiting for you?

Why does the latency drop from 39.2ms to 2.8ms after adding TCP_NODELAY?

Why is the average latency of the local test 55ms instead of the ping latency of 26ms?

How does the TCP protocol send data packets?

Come on, let’s strike while the iron is hot.

5. Clear up doubts

5.1 Who is TCP_NODELAY?

In Socket programming, the TCP_NODELAY option is used to control whether to enable the Nagle algorithm. In Java, true means to disable the Nagle algorithm, and false means to enable the Nagle algorithm. You must be asking what the Nagle algorithm is?

5.2 What is Nagle's algorithm?

The Nagle algorithm is a method for improving the efficiency of TCP/IP networks by reducing the number of packets sent across the network. It is named after its inventor, John Nagle, who first used the algorithm in 1984 to try to solve network congestion problems at Ford Motor Company.

Imagine if the application generates 1 byte of data each time, and then sends this 1 byte of data to the remote server in the form of a network data packet, it is easy to cause the network to be overloaded due to too many data packets. In this typical case, transmitting a data packet with only 1 byte of valid data requires an additional overhead of a 40-byte long header (i.e. 20 bytes of IP header + 20 bytes of TCP header), and the utilization rate of this payload is extremely low.

The content of Nagle's algorithm is relatively simple. The following is the pseudo code:

  1. if there is new data to send
  2. if the window size >= MSS and available data is >= MSS
  3. send complete MSS segment now
  4. else   
  5. if there is unconfirmed data still in the pipe
  6. enqueue data in the buffer until an acknowledge is received
  7. else  
  8. send data immediately
  9. end if
  10. end if
  11. end if

The specific approach is:

  • If the content to be sent is greater than or equal to 1 MSS, send it immediately;
  • If there is no packet that has not been ACKed before, send it immediately;
  • If there is a packet that has not been ACKed before, cache the sent content;
  • If ACK is received, the buffered content is sent immediately. (MSS is the maximum data segment that can be transmitted in a TCP data packet at a time)

5.3 What is Delayed ACK?

As we all know, in order to ensure the reliability of transmission, the TCP protocol stipulates that a confirmation needs to be sent to the other party when a data packet is received. Simply sending a confirmation will be costly (20 bytes of IP header + 20 bytes of TCP header). TCP Delayed ACK is designed to improve network performance and solve this problem. It combines several ACK responses into a single response, or sends the ACK response together with the response data to the other party, thereby reducing protocol overhead.

The specific approach is:

  • When there is response data to be sent, ACK will be sent to the other party immediately along with the response data;
  • If there is no response data, ACK will be delayed to wait and see if there is response data to send together. In Linux system, the default delay time is 40ms;
  • If the other party's second data packet arrives while waiting to send ACK, ACK should be sent immediately. However, if the other party's three data packets arrive one after another, whether to send ACK immediately when the third data segment arrives depends on the above two conditions.

5.4 What chemical reaction will occur when Nagle and Delayed ACK are combined?

Both Nagle and Delayed ACK can improve the efficiency of network transmission, but together they can do more harm than good. For example, in the following scenario:

A and B perform data transmission: A runs the Nagle algorithm and B runs the Delayed ACK algorithm.

If A sends a data packet to B, B will not respond immediately due to Delayed ACK. If A uses the Nagle algorithm, A will keep waiting for B's ACK and will not send the second data packet until ACK comes. If these two data packets are for the same request, the request will be delayed by 40ms.

5.5 Let’s grab a bag and have fun

Let's capture a packet to verify it. Execute the following script on the backend HTTP service to easily complete the capture process.

  1. and  
  2. host
  3. 10.48
  4. .
  5. 159.165
  6. -s
  7. 0
  8. -w traffic.pcap

As shown in the figure below, this is the display of using Wireshark to analyze the packet content. The red box is a complete POST request processing process. The difference between sequence number 130 and sequence number 149 is 40ms (0.1859 - 0.1448 = 0.0411s = 41ms). This is the chemical reaction of sending Nagle and Delayed ACK together. Among them, 10.48.159.165 runs Delayed ACK, and 10.22.29.180 runs Nagle algorithm. 10.22.29.180 is waiting for ACK, and 10.48.159.165 triggers Delayed ACK, so it waits for 40ms.

This also explains why the test environment takes 39.2ms, because most of it is delayed by the 40ms of Delayed ACK.

But when reproducing locally, why is the average latency of the local test 55ms instead of the ping latency of 26ms? Let's capture a packet as well.

As shown in the figure below, the red box shows a complete POST request processing process. The difference between sequence number 8 and sequence number 9 is about 25ms, minus the network delay which is about half of the ping delay of 13ms, so the Delayed Ack is about 12ms (due to some differences between the local MAC system and Linux).

  1. 1. Linux uses the system configuration /proc/sys/net/ipv4/tcp_delack_min to control the Delayed ACK time. The default setting in Linux is 40ms.
  2. 2.MAC controls Delayed ACK through the net.inet.tcp.delayed_ack system configuration.
  3. delayed_ack=0 responds after every packet ( OFF )
  4. delayed_ack=1 always employs delayed ack, 6 packets can get1 ack
  5. delayed_ack=2immediate ack after 2ndpacket, 2 packets per ack (Compatibility
  6. Mode)
  7. delayed_ack=3should autodetect when   to employ delayed ack, 4packets per ack. ( DEFAULT ) Set it to 0 to disable delayed ACK, set it to 1 to always delay ACK, set it to 2 to reply an ACK for every two packets, and set it to 3 to let the system automatically detect the timing of replying ACK.

5.6 Why does TCP_NODELAY solve the problem?

TCPNODELAY disables the Nagle algorithm. Even if the ACK of the previous data packet has not arrived, the next data packet will be sent, thereby breaking the effect of Delayed ACK. Generally, in network programming, it is strongly recommended to enable TCPNODELAY to improve the response speed.

Of course, you can also solve the problem by configuring the Delayed ACK related system, but since it is inconvenient to modify the machine configuration, this method is not recommended.

6. Conclusion

This article is a troubleshooting process caused by a simple HTTP call with a large delay. In the process, the relevant issues are analyzed from the outside to the inside, and then the problems are located and the solutions are verified. ***The article gives a comprehensive explanation of Nagle and Delayed ACK in TCP transmission, and analyzes the problem case more thoroughly.

<<:  A conscientious work explaining "service call"!

>>:  IBM acquires Red Hat. Will it be its rival, Google or Huawei? Let's wait and see.

Recommend

LowEndTalk (LEB) 2020 Low-End VPS Voting Ranking

A few years ago, LET often carried out voting act...

Understand the IP location function of the entire network in one article

Recently, WeChat, Douyin, Weibo, public accounts ...

When 5G network solves the fee issue, will your phone still use WiFi?

Recently, British media reported that Ofcom's...

What is DNS and how does it work?

The Domain Name System (DNS) is one of the founda...

Communications man, what on earth have you done to Everest?

Do you still remember the 5G "cloud supervis...

The difference between SDN and traditional network operation and maintenance

1. Pain points of traditional network operation a...

Interrupt or poll? It's so troublesome to get a data packet!

New employees in the network department My name i...

DesiVPS: $15/year-1GB/15G NVMe/1.5TB@10Gbps/San Jose data center

DesiVPS continues the Black Friday promotion in D...

How is IPv6 represented? How is IPv4 converted to IPv6?

IPv6 has been gradually applied, and now many ope...

Europe lags behind in 5G rollout, study shows

According to an assessment report released by the...