TCP retransmission problem troubleshooting ideas and practices

TCP retransmission problem troubleshooting ideas and practices

1. About TCP retransmission

TCP retransmission is a normal mechanism to ensure data transmission reliability. In a LAN environment, the network quality is guaranteed, because retransmission due to network problems should be extremely low; in an Internet or metropolitan area network environment, the lines are complex (you can imagine the city's underground pipe network, intricate electric poles, etc.), the network quality is not guaranteed, and the probability of retransmission is higher.

[[285498]]

TCP retransmission is not necessarily a network-level problem. It may also be that the receiving end does not exist, the receiving end's receive buffer is full, the application has an abnormal link that is not closed normally, etc.

2. TCP/IP related

To troubleshoot network problems, you need to understand the principles of TCP/IP. The truth is in each packet. The following are several key parameters related to TCP retransmission.

2.1 Parameters when establishing a TCP link


2.2 TCP retransmission type

Timeout retransmission

When the request packet is sent out, a timer is started. When the timer reaches the time, if no ACK is received, the request is resent until the resend limit is reached or an ACK is received.

Fast Retransmit

When the receiver receives a data packet with an abnormal sequence number, the receiver will repeatedly send the ACK that it should have received. At this time, if the sender receives three consecutive ACKs with the same sequence number, the fast retransmit mechanism will be activated to resend the packet corresponding to the ACK. For details, please refer to:


3. Common problems and solutions

3.1 TCP retransmission on a single machine or single application machine

The linked server or port may be unreachable.

Troubleshooting ideas


3.2 TCP retransmission on multiple machines or multiple applications simultaneously

It may be network jitter

Troubleshooting ideas

1. Check the network area buried points, check the network equipment alarms, and see if there is any regional network jitter. 2. If the regional network is fine, you can use the common problems: method to narrow the scope of investigation

3.3 Bandwidth Full

Troubleshooting ideas

1. View host monitoring

3.4 Uncommon Problems

1. Packet checksum failure caused by abnormality of network device port or optical module 2. Convergence jitter of network routing 3. Bug in host network driver, bug in network device, etc.

4. How to monitor

Use tsar -tcp -C to monitor the retran attribute of TCP, that is, the number of retransmissions.

  1. tsar --tcp -C | sed 's/:/_/g;s/=/ /g' | xargs -n 2  

Interested friends can directly execute the following monitoring script to obtain TCP-related status monitoring data, which is applicable to open-falcon.


5. Case practice

(1) Capture packets on the machine that encounters packet loss and retransmission and use wireshark to analyze the packets. Note that because retransmission does not always occur, the packet capture command must be executed continuously in order to capture the retransmitted packets. Use wireshark to open the tcpdump results and enter tcp.analysis.retransmission in the search box to get the following results:


Figure 1 shows that the server has retransmitted three times.

(2) Since there are many packets, we can use the trace stream function of Wireshark to obtain the TCP stream related to retransmission.


Figure 2 Tracking flow --> TCP flow can get retransmission related data packets


Figure 3 shows the request and response of the client and server.

(3) Analysis and retransmission

In particular, it is necessary to explain:

NO 67,68 The client does not receive the correct packet data for some reason and sends a dup ack to the server. Refer to the fast retransmission mentioned in the basic knowledge.

The time difference between NO.68 and NO.69 is 200ms (pay attention to the time column, the others are less than 1ms apart). The server waits for a timeout and retransmits.

NO 73-74 means the client sends a FIN packet and actively closes the connection.

This case only occurred once and has not been reproduced. No clear conclusion was obtained through packet capture and analysis.

6. Summary

This article summarizes the solution process of TCP retransmission problems encountered in my work, focusing on the general ideas and specific practices for solving the problem. There is less theoretical knowledge. If you are interested, you can read more related articles to gain a deeper understanding of the working mechanism of TCP.

<<:  In the 5G era, how to innovate network construction models?

>>:  South Korean government’s request for 5G fee reduction was rejected: How difficult is 5G construction?

Recommend

Regarding the 6G satellite, I am "confused"

[[351012]] On November 6, a satellite named "...

Building the future: How ICT can help develop livable cities

With the steady acceleration of global urbanizati...

Introduction to the complete 5G system

5G will soon be here, and it will be more than ju...

South Korean telecom operator SK Telecom's 5G users have reached 8.65 million

[[434445]] According to foreign media reports, af...

DiyVM: 50 yuan/month-2GB/50GB/10M/US CN2/Hong Kong CN2/Japan Osaka

Continue to share information about DiyVM. DiyVM ...

What can 5G technology do? It will have a significant impact on 20 industries

First of all, we must know what 5G is. In a nutsh...

5 exciting 5G use cases

As the fifth generation of wireless technology, 5...

Five signs SCVMM isn't right for your data center

Today, System Center Virtual Machine Manager (SCV...

Wi-Fi 7: What is it and when can you expect it to arrive?

[[380191]] Wi-Fi 7 is expected to have higher dat...

Help build a strong network nation, IPv6 “+” runs fast

Favorable policies inject a "boost" int...