Is HTTP really that difficult?

Is HTTP really that difficult?

HTTP is the most important and most used protocol in browsers. It is the communication language between browsers and servers. As browsers develop, HTTP is also evolving to adapt to new forms. It has gone through several stages such as 0.9, 1.0, 1.1, 2.0, and the future 3.0.

[[333036]]

Let me tell you a little story at the beginning. After reading it, you may have a lot of black question marks on your face, but don’t worry, wait until you finish reading the whole article and then come back to appreciate this little story.

I have multiple houses that need to be repaid.

But suddenly someone came to ask for money, and I made a few fake moves with the water pipe, and then he pushed me to the ground and beat me until I cried.

Then I refused to accept it and had a slow competition with him. As a result, I was slapped by him and retreated (slow competition) (multiple excellent pushes and retreats)

After that, I looked at him until our eyes froze.

As a result, he was even more ruthless on my face (gongjiaduozhu)

HTTP 0.9

Appearance time

1991

Cause

Used to transfer HTML hypertext content across the network.

accomplish

It adopts a request-response model, where the client sends a request and the server returns data.

process

  • Because HTTP is based on the TCP protocol, the client must first establish a TCP connection with the server based on the IP address, port, and the process of establishing a connection is the TCP protocol three-way handshake process.
  • After the connection is established, a GET request line is sent, such as GET /index.html to get index.html.
  • After receiving the request information, the server reads the corresponding HTML file and returns the data to the client in an ASCII character stream.
  • After the HTML document transfer is complete, the connection is closed.

Graphics

Features

  • There is only one request line, and no HTTP request header or request body because only one request line is needed to fully express the client's needs.
  • The server also does not return any header information. This is because the server does not need to tell the client too much information, it only needs to return data.
  • The returned file content is transmitted as an ASCII character stream.

HTTP 1.0

Appearance time

1994

Cause

With the development of browsers, not only HTML files are displayed in browsers, but also different types of files such as JavaScript, CSS, pictures, audio, video, etc. Therefore, it is necessary to support the download of multiple types of files.

The file format is not limited to ASCII encoding, there are many other types of encoding files.

Diagram:

New features: (Multi-state delay)

It provides good support for multiple files and different types of data. The HTTP/1.0 solution is negotiated through request headers and response headers. When initiating a request, the HTTP request header tells the server what type of file it expects the server to return, what form of compression to use, what language the file is in, and the specific encoding of the file.

  1. accept: text/html // Return type
  2. accept-encoding: gzip, deflate, br // compression method
  3. accept-Charset: ISO-8859-1,utf-8 // encoding format
  4. accept-language: zh-CN,zh // Language

With the introduction of status codes, the server may not be able to process some requests, or may process them incorrectly. In this case, the browser needs to be informed of the server's final processing of the request. The status code is notified to the browser through the response line.

Provides a Cache mechanism to cache downloaded data to reduce server pressure

Added a user agent field to count basic client information, such as the number of Windows and macOS users.

Memory: Multiple conditions and delays (You have multiple (conditions) houses that need to be repaid (delayed) loans (delayed))

HTTP 1.1

Appearance time

1999

Cause

As technology continues to develop, requirements are constantly iterating and updating, and soon HTTP/1.0 can no longer meet the needs.

New features:

  • Improved persistent connections.
  1. Since http1.0 is a short connection, each HTTP communication in HTTP/1.0 needs to go through three stages: establishing a TCP connection, transmitting HTTP data, and disconnecting the TCP connection. This will increase a lot of overhead. To solve this problem, HTTP/1.1 added a persistent connection method, which is characterized by being able to transmit multiple HTTP requests on a TCP connection. As long as the browser or server does not explicitly disconnect the connection, the TCP connection will remain. Persistent connections are enabled by default in HTTP/1.1. If you do not want to use persistent connections, you can add Connection: close to the HTTP request header.
  2. Currently, for the same domain name, the browser allows 6 TCP persistent connections to be established simultaneously by default. (TODO: Add a comparison chart here)
  3. Using CDN to implement domain name sharding mechanism
  • Immature HTTP pipelining

Pipelining in HTTP/1.1 refers to the technology of submitting multiple HTTP requests to the server in batches. Although the requests can be sent in batches, the server still needs to reply to the browser's request according to the order of the requests. Although persistent connections can reduce the number of TCP establishment and disconnection times, it needs to wait for the previous request to return before making the next request. If a request in the TCP channel does not return in time for some reason, it will block all subsequent requests. This is the famous head-of-line blocking problem. HTTP/1.1 attempts to solve the head-of-line blocking problem with pipelining.

  • Provide virtual host support

In HTTP/1.0, each domain name is bound to a unique IP address, so a server can only support one domain name. However, with the development of virtual host technology, it is necessary to bind multiple virtual hosts to a physical host, each virtual host has its own separate domain name, and these separate domain names all share the same IP address. Therefore, the Host field is added to the HTTP/1.1 request header to indicate the current domain name address, so that the server can do different processing according to different Host values.

  • Provides perfect support for dynamically generated content

When using HTTP/1.0, you need to set the complete data size in the response header, such as Content-Length: 901, so that the browser can receive data according to the set data size. However, with the development of server-side technology, the content of many pages is dynamically generated, so the final data size is not known before transmitting the data, which causes the browser to not know when all the file data will be received.

HTTP/1.1 solves this problem by introducing the Chunk transfer mechanism. The server will split the data into several data blocks of arbitrary size. Each data block will be attached with the length of the previous data block when it is sent. Finally, a zero-length block is used as a sign that the data sending is completed. This provides support for dynamic content.

  • Client Cookie, Security Mechanism

HTTP/1.1 also introduced client-side cookie mechanism and security mechanism

Memory: Holding a pipe and moving it around for a few times, someone hit me and I cried.

HTTP 2.0

Appearance time

In 2015, most major browsers also supported the standard by the end of that year.

Cause

Although HTTP/1.1 has adopted many strategies to optimize resource loading speed and has achieved certain results, the bandwidth utilization of HTTP/1.1 is not ideal. This is mainly due to the following reasons:

  • TCP Slow Start

Once a TCP connection is established, it enters the data sending state. At the beginning, the TCP protocol will use a very slow speed to send data, and then slowly increase the speed of sending data until the speed of sending data reaches an ideal state. We call this process slow start. Slow start is a strategy of TCP to reduce network congestion, and we cannot change it. Because some key resource files commonly used in pages are not large, such as HTML files, CSS files, and JavaScript files, these files usually initiate requests after the TCP connection is established, but this process is slow start, so it takes much more time than normal, which increases the time it takes to render the page for the first time.

If multiple TCP connections are opened at the same time, these connections will compete for a fixed bandwidth.

The system establishes multiple TCP connections at the same time. When the bandwidth is sufficient, the sending or receiving speed of each connection will slowly increase. Once the bandwidth is insufficient, these TCP connections will slow down the sending or receiving speed. This will cause a problem, because some TCP connections download some key resources, such as CSS files, JavaScript files, etc., while some TCP connections download ordinary resource files such as pictures and videos. However, multiple TCP connections cannot negotiate which key resources to download first, which may affect the download speed of those key resources.

  • HTTP/1.1 Head-of-line blocking problem

When using persistent connections in HTTP/1.1, although a TCP pipeline can be shared, only one request can be processed in a pipeline at the same time. Before the current request is completed, other requests can only be blocked. This means that we cannot send requests and receive content in a pipeline at will. This is a very serious problem, because there are many factors that block requests, and they are all uncertain factors. If a request is blocked for 5 seconds, then the subsequent queued requests will have to wait for 5 seconds. During this waiting process, bandwidth and CPU are wasted. In addition, head-of-line blocking prevents data from being requested in parallel, so head-of-line blocking is very unfavorable to browser optimization.

  • Memory: Slow race jogging competition

Implementation ideas

The idea of ​​HTTP/2 is that a domain name only uses one TCP long connection to transmit data, so that the download process of the entire page resource only requires one slow start, and it also avoids the problem caused by multiple TCP connections competing for bandwidth. In addition, there is the problem of head-of-line blocking. Waiting for the request to complete before requesting the next resource is undoubtedly the slowest method, so HTTP/2 needs to implement parallel resource requests, that is, requests can be sent to the server at any time without waiting for other requests to complete, and the server can also return the processed request resources to the browser at any time. That is, a domain name only uses one TCP long connection and eliminates the problem of head-of-line blocking.

Diagram:

New Features

  • Multiplexing, by introducing the binary framing layer, the HTTP multiplexing technology is realized.
  • First, the browser prepares the request data, including the request line, request header and other information. If it is a POST method, there must also be a request body. After these data are processed by the binary framing layer, they will be converted into frames with request ID numbers and sent to the server through the protocol stack. After the server receives all the frames, it will merge all the frames with the same ID into a complete request message. Then the server processes the request and sends the processed response line, response header and response body to the binary framing layer respectively. Similarly, the binary framing layer will convert these response data into frames with request ID numbers and send them to the browser through the protocol stack. After the browser receives the response frame, it will submit the frame data to the corresponding request according to the ID number.
  • Setting the priority of a request

We know that some data in the browser is very important, but when sending requests, important requests may be later than those less important requests. If the server replies to the data in the order of the requests, then the important data may be delayed for a long time before being delivered to the browser. To solve this problem, HTTP/2 provides request priority. When sending a request, you can mark the priority of the request, so that after receiving the request, the server will give priority to the request with a high priority.

  • Server Push

In addition to setting the priority of requests, HTTP/2 can also push data directly to the browser in advance.

  • Header Compression

HTTP/2 compresses the request header and response header. You may think that an HTTP header file is not very large, and it may not matter whether it is compressed or not. But think about it this way. When the browser sends a request, it basically sends the HTTP request header, and rarely sends the request body. Usually a page has about 100 resources. If the data of these 100 request headers is compressed to 20% of the original, then the transmission efficiency will definitely be greatly improved.

Memory: Duoyou pushed and shrunk. After one more slap, he (you) retreated (pushed) and shrunk (shrunk).

HTTP 3.0

Cause

  • Head-of-line blocking still exists at the TCP level

During TCP transmission, the loss of a single data packet will cause congestion. As the packet loss rate increases, the transmission efficiency of HTTP/2 will become worse and worse. Test data shows that when the system reaches a 2% packet loss rate, the transmission efficiency of HTTP/1.1 is better than that of HTTP/2.

  • TCP connection establishment delay

The TCP handshake process also affects the transmission efficiency. We know that both HTTP/1 and HTTP/2 use the TCP protocol for transmission, and if HTTPS is used, the TLS protocol must be used for secure transmission, and the use of TLS also requires a handshake process, so there are two handshake delay processes. In short, before transmitting data, we need to spend 3 to 4 RTTs. If the servers are far apart, then 1 RTT may take more than 100 milliseconds. In this case, the entire handshake process takes 300 to 400 milliseconds, and users can clearly feel the "slowness".

  • TCP protocol rigidity

The rigidity of the intermediate devices: If we upgrade the TCP protocol on the client side, but when the data packets of the new protocol pass through these intermediate devices, they may not understand the content of the packet, so the data will be discarded. This is the rigidity of the intermediate devices, which is a major obstacle to TCP updates.

The operating system is another reason for the rigidity of the TCP protocol, because the TCP protocol is implemented through the operating system kernel, and the application can only use it but not modify it. Usually the operating system update lags behind the software update, so it is very difficult to freely update the TCP protocol in the kernel.

Memory: The two looked at each other until their eyes were frozen

Implementation ideas

HTTP/3 chose a compromise method - UDP protocol, which implements functions similar to TCP such as multi-channel data streams and transmission reliability based on UDP. We call this set of functions the QUIC protocol.

HTTP/2 and HTTP/3 protocol stack

characteristic

  • Implemented flow control and transmission reliability functions similar to TCP

Although UDP does not provide reliable transmission, QUIC adds a layer on top of UDP to ensure reliable data transmission. It provides packet retransmission, congestion control, and other features that exist in TCP.

  • Integrated TLS encryption

Currently, QUIC uses TLS1.3, which has more advantages than the earlier version TLS1.3. The most important one is that it reduces the number of RTTs spent on handshakes.

  • Implemented multiplexing in HTTP/2

Unlike TCP, QUIC implements multiple independent logical data streams on the same physical connection. By implementing separate transmission of data streams, the problem of head-of-line blocking in TCP is solved.

  • Implemented fast handshake function

Since QUIC is based on UDP, QUIC can use 0-RTT or 1-RTT to establish a connection, which means that QUIC can send and receive data at the fastest speed, which can greatly improve the speed of opening the page for the first time.

Memory: Gongjiaduowu (more (gong) plus (jia) more (duo) my (wu) face)

Problems

Neither the server nor the browser provides complete support for HTTP/3

The system kernel's optimization of UDP is far from reaching the level of TCP optimization, which is also an important reason hindering QUIC.

The problem of rigidity of intermediate devices. These devices are far less optimized for UDP than TCP. According to statistics, when using the QUIC protocol, there is a packet loss rate of about 3% to 7%.

future

There is still a long way to go from standard formulation to practice and then to protocol optimization; and because the underlying protocol is changed, the growth of HTTP/3 will be slow, which is fundamentally different from HTTP/2. However, companies such as Tencent have tried to implement the use of HTTP3 in production, such as QQ Interest Tribe.

In early May 2020, Microsoft announced the open source of its internal QUIC library, MsQuic, and will fully recommend the QUIC protocol to replace the TCP/IP protocol.

So overall, http3 has a promising future

Author: A rookie siege lion

Source: https://www.cnblogs.com/suihang/p/13265136.html

<<:  What is the difference between MPLS and IP?

>>:  A 20,000-word in-depth introduction to the principles of distributed systems

Recommend

Juniper Networks' Shaowen Ma: The best SDN controller for cloud computing

[51CTO.com original article] The interview with M...

Learn RTMP and RTSP streaming protocols in seconds

RTMP and RTSP are two common streaming protocols....

Seven common misconceptions about the 802.11ax wireless LAN standard

For the 802.11ax wireless LAN standard, which is ...

5G technology and its impact on the Internet of Things

5G is the latest generation of cellular network t...

A Brief Analysis of TSN Time Sensitive Network Technology

With the continuous development of industrial int...

Digital transformation accelerates the arrival of the 6G era

World Telecommunication and Information Society D...

5 tips for hosting a successful virtual meeting

Running a virtual meeting requires a very differe...

Interviewer: How do you understand the TCP/IP protocol?

[[400060]] This article is reprinted from the WeC...

New 5G LAN technology advances QoS across the enterprise

As enterprises integrate 5G technology into their...