Why TCP will be replaced by UDP

Why TCP will be replaced by UDP

Why's THE Design is a series of articles about programming decisions in the computer field. In each article in this series, we will raise a specific problem and discuss the advantages and disadvantages of this design and its impact on the specific implementation from different perspectives.

[[312680]]

The TCP protocol can be said to be the cornerstone of today's Internet. As a reliable transmission protocol, almost all data today is transmitted through the TCP protocol. However, when TCP was first designed, it did not take into account today's complex network environment. When you are tortured by intermittent network on the subway or train, you may not know that all this may be caused by the TCP protocol. This article will analyze why the TCP protocol has serious performance problems in a weak network environment[^1].

The underlying data transmission protocol must make trade-offs between bandwidth utilization and communication latency when designing, so it is impossible to solve all problems in actual production. TCP chooses to make full use of bandwidth and is designed for traffic, hoping to transmit more data in the shortest possible time[^2].

In network communications, the time from when the sender sends data to when it receives confirmation from the receiver is called the round-trip time (RTT).

A weak network environment is a special scenario with a high packet loss rate. TCP performs poorly in such scenarios. When the RTT is 30ms, once the packet loss rate reaches 2%, the TCP throughput will drop by 89.9%[^3]. From the table below, we can see that packet loss has a very significant impact on TCP throughput:

This article will analyze three reasons that affect TCP performance in a weak network environment (high packet loss rate):

TCP’s congestion control algorithm actively reduces throughput when packets are lost;

TCP's three-way handshake adds latency and overhead to data transmission;

TCP's cumulative acknowledgment mechanism results in the transmission of data segments;

Among the three reasons mentioned above, the congestion control algorithm is the primary reason why TCP performs poorly in a weak network environment. The impact of the three-way handshake and cumulative responses decreases in turn, but they also aggravate TCP's performance problems.

Congestion Control

The TCP congestion control algorithm is the main congestion control measure on the Internet. It uses a set of network congestion control methods based on additive increase/multiplicative decrease (AIMD) to control congestion[^4], which is also the main cause of TCP performance problems.

The first Internet congestion collapse was discovered in 1986, when the processing capacity of the backbone network of NSFnet Phase I dropped from 32,000 bit/s to 40 bit/s. The processing capacity of the backbone network was not resolved until 1987 and 1988, when the TCP protocol implemented congestion control[^5]. It is precisely because of the collapse caused by network congestion that the TCP congestion control algorithm assumes that as long as packet loss occurs, the current network is congested. Based on this assumption, TCP uses the slow start and line increase and decrease[^6] mechanisms to implement congestion control.

tcp-congestion-control

Figure 1 - TCP congestion control mechanism

Each TCP connection maintains a congestion control window, which has two functions:

  1. Prevent the sender from sending too much data to the receiver, causing the receiver to be unable to process it;
  2. Prevent any one direction of the TCP connection from sending a large amount of data to the network, causing network congestion and collapse;

In addition to the congestion window size (cwnd), both parties of the TCP connection have a receive window size (rwnd). When the TCP connection is first established, neither the sender nor the receiver knows the other party's receive window size, so the communicating parties need a dynamic estimation mechanism to change the data transmission speed. During the TCP three-way handshake, the communicating parties will notify each other of their receive window size through an ACK message. The receive window size is generally determined by the bandwidth-delay product (BDP) [^7], but we will not elaborate on this here.

The maximum number of data segments that a client can transmit simultaneously is the minimum of the receive window size and the congestion window size, i.e., min(rwnd, cwnd). The initial congestion window size of a TCP connection is a relatively small value, which is defined by TCP_INIT_CWND in Linux [^8]:

  1. /* TCP initial congestion window as per rfc6928 */
  2. #define TCP_INIT_CWND 10

The size of the initial congestion control window has been modified many times since its appearance. Several RFC documents named Increasing TCP's Initial Window: RFC2414[^9], RFC3390[^10] and RFC6928[^11] have increased the value of initcwnd to adapt to the ever-increasing network transmission speed and bandwidth.

tcp-congestion-window

Figure 2 - Linear increase and decrease of TCP congestion control algorithm

As shown in the figure above, the congestion control window size of the sender of a TCP connection changes according to the response of the receiver:

  1. Linear growth: After 1 RTT, the congestion window size will increase by one;
  2. Productive reduction: When packets sent by the sender are lost, the slow start threshold is halved;

If the TCP connection has just been established, due to the default settings of the Linux system, the client can send 10 data segments at the same time. Assuming that the bandwidth of our network is 10M, the RTT is 40ms, and the size of each data segment is 1460 bytes, then the upper limit of the window size of both communicating parties calculated using BDP should be 35, so as to make full use of the network bandwidth:

However, it takes 2RTT for the size of the congestion control window to increase from 10 to 35. The specific process is as follows:

  1. The sender sends initcwnd = 10 data segments to the receiver (consuming 0.5RTT);
  2. After receiving 10 data segments, the receiver sends an ACK to the sender (consuming 0.5RTT);
  3. The sender receives the ACK from the sender, and the congestion control window size is increased by 10 due to the successful transmission of 10 data segments. The current congestion control window size reaches 20;
  4. The sender sends 20 data segments to the receiver (consuming 0.5RTT);
  5. After receiving 20 data segments, the receiver sends an ACK to the sender (consuming 0.5RTT);
  6. The sender receives the ACK from the sender, and the congestion control window size is increased by 20 due to the successful transmission of 20 data segments. The current congestion control window size reaches 40;

It takes 3.5RTT, or 140ms, from the TCP three-way handshake to establish a connection until the congestion control window size reaches the maximum value of 35 under the assumed network conditions. This is a relatively long time.

In the early days of the Internet, most computing devices were connected via wired networks, and the possibility of network instability was relatively low. Therefore, the designers of the TCP protocol believed that packet loss meant network congestion. Once packet loss occurred, the client's frantic retries could cause congestion and collapse of the Internet, so they invented the congestion control algorithm to solve this problem.

However, today's network environment is more complex. The introduction of wireless networks has made network instability a normal phenomenon in some scenarios. Therefore, packet loss does not necessarily mean network congestion. If more aggressive strategies are used to transmit data, better results will be achieved in some scenarios.

Three-way handshake

The three-way handshake used by TCP to establish a connection should be a knowledge point that all engineers in the world are very familiar with. The main purpose of the three-way handshake is to avoid the establishment of historical erroneous connections and allow the communicating parties to determine the initial sequence number[^12]. However, the cost of the three-way handshake is quite high. In the case of no packet loss, it requires the two parties to establish a TCP connection to communicate three times.

basic-3-way-handshake

Figure 3 - Common TCP three-way handshake

If we want to access the server in Shanghai from Beijing, since the straight-line distance between Beijing and Shanghai is about 1,000 kilometers, and the speed of light is the current limit of communication speed, the RTT will definitely be greater than 6.7ms:

However, because light does not travel in a straight line in an optical fiber, the actual transmission speed is ~31% slower than the speed of light[^13], and data needs to jump back and forth between various network devices, so it is difficult to reach the theoretical limit. In a production environment, the RTT from Beijing to Shanghai is about 40ms, so the shortest time required for TCP to establish a connection is also 60ms (1.5RTT).

In subways and stations with poor network environments, it is difficult for the client to quickly complete three communications with the server and establish a TCP connection due to the high packet loss rate. When the client does not receive a response from the server for a long time, it can only keep retrying. As the number of requests gradually increases, the access delay will also increase.

Since most HTTP requests do not carry a lot of data, the size of the uncompressed request and response headers is around ~200B to 2KB, and the additional overhead caused by the TCP three-way handshake is 222 bytes, of which the Ethernet data frame occupies 3 * 14 = 42 bytes, the IP data frame occupies 3 * 20 = 60 bytes, and the TCP data frame occupies 120 bytes:

tcp-three-way-handshake-overhead

Figure 4 - TCP three-way handshake overhead

Although TCP does not establish a connection for every data segment sent, the cost of establishing a connection through a three-way handshake is still quite high. It not only requires an additional 1.5RTT network delay, but also an additional 222 bytes of overhead. Therefore, in a weak network environment, establishing a connection through a three-way handshake will aggravate TCP's performance issues.

Retransmission mechanism

The reliability of TCP transmission is guaranteed by sequence numbers and the receiver's ACK. When TCP transmits a data segment, it puts a copy of the data segment on the retransmission queue and starts a timer [^14]:

  • If the sender receives an ACK response corresponding to the data segment, the current data segment will be deleted from the retransmission queue;
  • If the sender does not receive the ACK corresponding to the data segment before the timer expires, it will resend the current data segment;

TCP's ACK mechanism may cause the sender to retransmit a data segment that the receiver has already received. The ACK message in TCP indicates that all messages before this message have been successfully received and processed, for example:

  1. The sender sent messages numbered 1-10 to the receiver;
  2. The receiver sends an ACK 8 response to the sender;
  3. The sender believes that messages numbered 1-8 have been successfully received;

This ACK method is simpler to implement and can more easily ensure the order of messages, but it may cause the sender to retransmit the received data in the following situations:

tcp-retransmission-al

Figure 5 - TCP retransmission strategy

As shown in the figure above, the receiver has received data with sequence numbers 2-5, but because the semantics of TCP ACK is that all data segments before the current data segment have been received and processed, the receiver cannot send an ACK message. Since the sender has not received the ACK, the timers corresponding to all data segments will time out and retransmit the data. In a network with severe packet loss, this retransmission mechanism will cause a lot of bandwidth waste.

Summarize

Although some designs of the TCP protocol still have great value today, they are not applicable to all scenarios. In order to solve the performance problem of TCP, there are currently two solutions in the industry:

  1. Use UDP to build a more performant and flexible transport protocol, such as QUIC[^15];
  2. Optimize the performance of the TCP protocol through various means, such as Selective ACK (SACK)[^16], TCP Fast Open (TFO)[^17];

Since the TCP protocol is in the operating system kernel, it is not conducive to protocol updates, so the first solution is currently better developed, and HTTP/3 uses QUIC as the transmission protocol[^18]. Here we review the three important reasons that lead to TCP performance issues:

  • TCP congestion control will yield when packet loss occurs, reducing the number of data segments that can be sent, but packet loss does not necessarily mean network congestion, it is more likely that the network condition is poor;
  • TCP's three-way handshake brings additional overhead, which not only includes the need to transmit more data, but also increases the network delay of the first data transmission;
  • TCP's retransmission mechanism may retransmit successfully received data segments when data packets are lost, resulting in a waste of bandwidth;

The TCP protocol is well-deserved as the cornerstone of Internet data transmission. Although it does have some problems when dealing with special scenarios, its design ideas have a lot of reference value and are worth learning.

Finally, let's look at some more open related questions. Interested readers can think carefully about the following questions:

  • Can the QUIC protocol guarantee transmission performance when the packet loss rate is high?
  • In addition to SACK and TFO, what other methods can be used to optimize TCP performance?

If you have any questions about the content in the article or want to know more about the reasons behind some design decisions in software engineering, you can leave a message below the blog. The author will promptly respond to questions related to this article and select appropriate topics as follow-up content.

[^1]: TCP Selective Acknowledgment Options, October 1996 https://tools.ietf.org/html/rfc2018

[^2]: KCP - A Fast and Reliable ARQ Protocol https://github.com/skywind3000/kcp

[^3]: Measuring Network Performance: Links Between Latency, Throughput and Packet Loss https://accedian.com/enterprises/blog/measuring-network-performance-latency-throughput-packet-loss/

[^4]: Wikipedia: TCP congestion control https://en.wikipedia.org/wiki/TCP_congestion_control

[^5]: Wikipedia: Network congestion https://en.wikipedia.org/wiki/Network_congestion#Congestive_collapse

[^6]: Wikipedia: Additive increase/multiplicative decrease https://en.wikipedia.org/wiki/Additive_increase/multiplicative_decrease

[^7]: Bandwidth-delay product https://en.wikipedia.org/wiki/Bandwidth-delay_product

[^8]: TCP_INIT_CWND https://github.com/torvalds/linux/blob/738d2902773e30939a982c8df7a7f94293659810/include/net/tcp.h#L226

[^9]: RFC2414 Increasing TCP's Initial Window https://tools.ietf.org/html/rfc2414

[^10]: RFC3390 Increasing TCP's Initial Window https://tools.ietf.org/html/rfc3390

[^11]: RFC6928 Increasing TCP's Initial Window https://tools.ietf.org/html/rfc6928

[^12]: Why TCP requires a three-way handshake to establish a connection, October 2019 https://draveness.me/whys-the-design-tcp-three-way-handshake

[^13]: Researchers create fiber network that operates at 99.7% speed of light, smashes speed and latency records, March 2013 https://www.extremetech.com/computing/151498-researchers-create-fiber-network-that-operates-at-99-7-speed-of-light-smashes-speed-and-latency-records

[^14]: RFC793 Transmission Control Protocol, September 1981 RFC793 https://tools.ietf.org/html/rfc793

[^15]: Wikiepedia: QUIC https://en.wikipedia.org/wiki/QUIC

[^16]: RFC018 TCP Selective Acknowledgment Options, October 1996 https://tools.ietf.org/html/rfc2018

[^17]: RFC7413 TCP Fast Open, December 2014 https://tools.ietf.org/html/rfc7413

[^18]: HTTP-over-QUIC to be renamed HTTP/3, November 2018 https://www.zdnet.com/article/http-over-quic-to-be-renamed-http3/

<<:  Ruizhi Big Data: Injecting Intelligent Genes into Dual-State IT

>>:  The 2020 Third Dual-State IT Beijing User Conference concluded successfully!

Blog    
Blog    

Recommend

5G toB: The next battle between operators and OTT?

In the 5G era, will the battle between operators ...

How energy-efficient networks support sustainable development

As extreme weather conditions wreak havoc, compan...

Network Basics: How IP and MAC Addresses Work

Both IP addresses and MAC addresses identify devi...

TNAHosting: $5/month-4 cores/12GB/500GB/15TB/Chicago data center

TNAHosting is a foreign hosting company founded i...

DHCP in 37 pictures: The invisible person who gives you your IP address

DHCP appears A computer or mobile phone needs an ...

Deutsche Telekom expects 5G network to cover 50% of the German population by 2022

Telefénica/O2, the German telecom operator contro...

Learn RTMP and RTSP streaming protocols in seconds

RTMP and RTSP are two common streaming protocols....

Let's talk about virtual mobile network security

1. Introduction With the rise of 5G technology, v...

What does the battle for AI spectrum mean for 5G?

With the rapid development of smart cities, every...