Learn the history of HTTP in 6 minutes

Learn the history of HTTP in 6 minutes

[[386748]]

HTTP/0.9HTTP/0.9 was proposed in 1991 and is mainly used for academic exchanges. The requirements are simple - it is used to transfer HTML hypertext content between networks, so it is called Hypertext Transfer Protocol. Overall, its implementation is also simple, using a request-response-based model, where the client sends a request and the server returns data.

Complete request process

  • Because HTTP is based on the TCP protocol, the client must first establish a TCP connection with the server based on the IP address, port, and the process of establishing a connection is the TCP protocol three-way handshake process.
  • After the connection is established, a GET request line is sent, such as GET /index.html to get index.html.
  • After receiving the request information, the server reads the corresponding HTML file and returns the data to the client in an ASCII character stream.
  • After the HTML document transfer is complete, the connection is closed.

HTTP/0.9 request process

Features

  • The first is that there is only one request line, and there is no HTTP request header and request body, because only one request line is needed to fully express the client's needs.
  • The second is that the server does not return any header information. This is because the server does not need to tell the client too much information, it only needs to return data.
  • The third is that the returned file content is transmitted as an ASCII character stream. Since they are all HTML format files, it is most appropriate to use ASCII bytecode to transmit them.

HTTP/1.0

HTTP/0.9 has many problems, such as the following:

  • Only HTML files are supported. JS, CSS, fonts, images, videos and other types of files cannot be transferred.
  • The file transfer format is limited to ASCII and cannot output files of other types of encoding;
  • Only the request line is transmitted to the server, which is too little information;
  • Only respond to request data and cannot transmit additional data to the browser.

So it could no longer meet the needs at the time, so HTTP/1.0 came along, which brought the following:

  • New request headers and request bodies have been added to transmit more information to the server, such as the following request header fields: Accept file type, Accept-Encoding compression format, Accept-Charset character encoding format, Accept-Language internationalized language:
  1. Accept: text/html
  2. Accept-Encoding: gzip, deflate, br
  3. Accept-Charset: ISO-8859-1,utf-8
  4. Accept-Language: zh-CN,zh
  • The User-Agent field is added to the request header to be used by the server to collect client information:
  1. User -Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
  • New response headers have been added to tell the browser more information, such as Content-Encoding, which indicates the compression type of the file returned by the server, and Content-Type, which tells the browser what type of file the server returns and what encoding format is used:
  1. Content-Encoding: gzip
  2. Content-Type: text/html; charset=utf-8
  • Add a new response line status code to inform the browser of the status of the current request, such as 200 for a successful request:
  1. HTTP/1.1 200 OK
  • A new cache mechanism is added to cache already downloaded resources, reducing the pressure on the server.

From the perspective of building a request process, the biggest difference between HTTP/1.0 and HTTP/0.9 is that a lot of new fields are added during requests and responses for communication between the browser and the server.

HTTP/1.1

Although HTTP/1.0 is capable of transmitting different types of files, it still has shortcomings. For example, each HTTP request needs to go through the following stages:

  • Establish a TCP connection;
  • HTTP requests;
  • HTTP response;
  • Disconnect the TCP connection.

HTTP/1.0 sends multiple requests with the same domain name:


HTTP/1.0 Short Links

It can be found that each request needs to re-establish the TCP connection and disconnect the operation, which undoubtedly increases the network overhead and also delays the page display.

HTTP/1.1 adds the Connection field in the request header : used to provide TCP persistent connection**:

  1. Connection : keep-alive

Persistent connections are enabled by default, that is, for the same domain name, the browser supports 6 TCP persistent connections by default. When persistent connections are enabled, multiple requests under the same domain name will be sent as follows:


HTTP/1.1 persistent connections

HTTP/1.1 adds a new Host field to support virtual hosts

  1. Host: bubuzou.com
  • "Virtual host: Multiple virtual hosts are bound to one physical machine. Each virtual host has a separate domain name, and these domain names all share one IP address.

HTTP/1.1 supports dynamic content by introducing the Chunk transfer mechanism : the server divides the data into several data blocks of arbitrary size, and each data block is sent with the length of the previous data block attached. Finally, a zero-length block is used as a sign that the data sending is completed.

HTTP/1.1 also introduced client-side cookie mechanism and security mechanism

HTTP/2

We know that HTTP/1.1 has made a lot of optimizations for network efficiency. The core ones are the following three methods:

  • Added persistent connections;
  • The browser maintains up to 6 TCP persistent connections for each domain name at the same time;
  • Use CDN to implement domain name sharding mechanism.

Problems that remain in HTTP/1.1

Although HTTP/1.1 has adopted many strategies to optimize resource loading speed and has achieved certain results, the bandwidth utilization of HTTP/1.1 is not ideal, which is also a core problem of HTTP/1.1.

Bandwidth refers to the maximum number of bytes that can be sent or received per second. We call the maximum number of bytes that can be sent per second the upstream bandwidth, and the maximum number of bytes that can be received per second the downstream bandwidth.

The reason why HTTP/1.1 does not have an ideal bandwidth utilization is that it is difficult for HTTP/1.1 to fully utilize the bandwidth. For example, the actual download speed of the 100M bandwidth we often talk about can reach 12.5M/S, but when using HTTP/1.1, the maximum speed may only be 2.5M/S when loading page resources, and it is difficult to fully utilize the 12.5M.

The reason for this problem is mainly caused by three problems:

The first reason is TCP's slow start

Once a TCP connection is established, it enters the data sending state. At the beginning, the TCP protocol will use a very slow speed to send data, and then slowly increase the speed of sending data until the speed of sending data reaches an ideal state. We call this process slow start. This process can be imagined as the starting process of a car, which is slow at the beginning and accelerates faster when the speed increases.

The reason why slow start causes performance problems is that some key resource files commonly used in the page are not large, such as HTML files, CSS files and JavaScript files. Usually these files are requested after the TCP connection is established, but this process is slow start, so it takes much longer than normal, which delays the precious first rendering time of the page.

The second reason is that multiple TCP connections are opened at the same time, and these connections will compete for a fixed bandwidth.

You can imagine that the system establishes multiple TCP connections at the same time. When the bandwidth is sufficient, the sending or receiving speed of each connection will slowly increase; once the bandwidth is insufficient, these TCP connections will slow down the sending or receiving speed.

This will cause a problem, because some TCP connections download some key resources, such as CSS files, JavaScript files, etc., while some TCP connections download ordinary resource files such as pictures and videos. However, multiple TCP connections cannot negotiate which key resources to download first, which may affect the download speed of those key resources.

The third reason is HTTP/1.1 head-of-line blocking

We know that when using persistent connections in HTTP/1.1, although a TCP pipeline can be shared, only one request can be processed in a pipeline at the same time, and other requests can only be blocked before the current request is completed. This means that we cannot send requests and receive content in a pipeline at will.

This is a very serious problem, because there are many factors that block requests, and they are all uncertain factors. If some requests are blocked for 5 seconds, then the subsequent queued requests will be delayed and wait for 5 seconds. During this waiting process, bandwidth and CPU are wasted.

HTTP/2 Multiplexing

In order to solve the problems in HTTP/1.1, the most disruptive solution is adopted in HTTP/2: multiplexing mechanism.


What is HTTP/2 multiplexing?

In simple terms, the multiplexing mechanism of HTTP/2 means that the browser only establishes one TCP connection channel for resources of the same domain name, and all requests for this domain name are completed in this channel;

In addition, data transmission no longer uses text format, but will be divided into smaller streams and frames, and encoded in binary format. In a TCP connection channel, any number of bidirectional data streams are supported, and these data streams are parallel, out of order, and do not interfere with each other. The data transmitted in the data stream is a binary frame, which is the smallest unit of data transmission in HTTP/2. The frames in a stream are transmitted in sequence and in parallel, so there is no need to wait in order.

What problem was solved?

Because only one TCP connection is used, the time consumed by TCP slow start is reduced. In addition, because there is only a single TCP connection, there is no problem of different TCPs competing for network bandwidth.

After the request sent by the client passes through the binary framing layer, it is no longer a complete HTTP request message, but a bunch of disordered frames (that is, the frames of different streams are disordered, but the frames of the same stream are transmitted sequentially). Therefore, it will not be transmitted in sequence, and there is no waiting, thus solving the HTTP header blocking problem.

How it is achieved

  • First, the browser prepares the request data, including the request line, request header and other information. If it is a POST method, there must also be a request body.
  • After being processed by the binary framing layer, these data are converted into frames with request ID numbers and sent to the server through the protocol stack. The request header information is stored in the header frame, and the request body data is stored in the data frame.
  • After receiving all frames, the server will merge all frames with the same ID into a complete request message.
  • The server then processes the request and sends the processed response line, response header, and response body to the binary framing layer respectively.
  • Similarly, the binary framing layer converts the response data into frames with request ID numbers and sends them to the browser through the protocol stack.
  • After receiving the response frame, the browser will submit the frame data to the corresponding request according to the ID number.

Other HTTP/2 features

1. You can set the priority of the request

In the browser, some data is very important, such as critical CSS or JS. If these important data are pushed to the browser relatively late, it will definitely be a bad experience for users.

Therefore, HTTP/2 can support setting the priority of requests, so that the server will give priority to high-priority requests when it receives them.

2. Server Push

In HTTP/2, after the server parses an HTML page, the server knows that the browser needs the resources referenced on this page, such as CSS and JS, so the server will actively push these resources to the browser to reduce the client's waiting time.

3. Header Compression

HTTP/2 uses the HPACK compression algorithm to compress request headers and response headers. Although the effect of compressing a single request is not very obvious, if a page has 100 requests, then after each request is compressed by 20%, the speed-up effect will be obvious.

The compression principle of HPACK is actually two points:

  • It requires both the client and server to maintain and update an index list of previously seen header fields (i.e., to establish a shared compression context), which is then used as a reference for efficiently encoding previously transmitted values. The indexes are used to replace fields that already exist in static or dynamic tables on each side during actual transmission, thereby reducing the size of each request.
  • It allows the sent header fields to be encoded via a static Huffman code, thus reducing their respective transfer size.

HTTP/3

HTTP/2 is still based on TCP, so the following problems still exist.

TCP Head-of-Line Blocking

In HTTP/2, multiple requests are run in one TCP connection. If a packet is lost in a data stream, all requests in the TCP connection will be blocked. This is different from HTTP/1.1. In HTTP/1.1, since the browser establishes 6 TCP connections for each domain name, if one of the TCP connections is head-of-line blocked, the other 5 connections can still continue to transmit data.

TCP connection establishment delay

Before transmitting data, a TCP three-way handshake is required, which takes 1.5 RTTs; if it is HTTPS, a TLS connection is also required, which takes another 1 to 2 RTTs.

  • “Network latency is also called RTT (Round Trip Time), which is the total round-trip time from the browser sending a data packet to the server and then returning the data packet from the server to the browser.

In short, it takes 3 to 4 RTTs before data is transmitted. If the client and server are close, then 1 RTT is about 10ms, but if they are far, it may be 100ms, so it takes about 300ms before data is transmitted, and you can feel the slowness at this time.

TCP protocol rigidity

We know that the TCP protocol has problems with head-of-line blocking and connection establishment delays, but there is no way to improve the TCP protocol for the following two reasons:

Intermediate devices are rigid. Intermediate devices such as routers, switches, firewalls, and NAT rely on software that uses a lot of TCP features. Once the functions are set, they are rarely updated. If the TCP protocol is upgraded on the client, when the data packets of the new protocol pass through these devices, they may not understand the contents of the packets, resulting in data loss.

The operating system is another reason that causes the TCP protocol to become rigid.

QUIC Protocol

HTTP/3 is implemented based on UDP and implements functions similar to TCP such as multiple data streams and transmission reliability. We call this set of functions the QUIC protocol.

  • It implements the flow control and transmission reliability functions similar to TCP. Although UDP does not provide reliable transmission, QUIC adds a layer on top of UDP to ensure reliable data transmission. It provides packet retransmission, congestion control and some other features that exist in TCP.
  • Integrated TLS encryption function. Currently QUIC uses TLS1.3, which has more advantages than the earlier version TLS1.3, the most important of which is that it reduces the number of RTTs spent on handshakes.
  • It implements the multiplexing function in HTTP/2. Unlike TCP, QUIC implements multiple independent logical data streams on the same physical connection (as shown below). By implementing the separate transmission of data streams, the problem of head-of-line blocking in TCP is solved.
  • A fast handshake function is implemented. Since QUIC is based on UDP, QUIC can use 0-RTT or 1-RTT to establish a connection, which means that QUIC can send and receive data at the fastest speed, which can greatly improve the speed of opening a page for the first time.

Challenges of HTTP/3

  • First, from the current situation, neither the server nor the browser provides relatively complete support for HTTP/3. Although Chrome began to support the Google version of QUIC several years ago, there are very big differences between this version of QUIC and the official QUIC.
  • Second, there are also huge problems with deploying HTTP/3, because the system kernel's optimization of UDP is far from reaching the level of TCP optimization, which is also an important reason that hinders QUIC.
  • Third, there is the problem of rigidity of intermediate devices. These devices are far less optimized for UDP than TCP. According to statistics, when using the QUIC protocol, there is a packet loss rate of about 3% to 7%.

<<:  Do you know how to debug Modbus protocol?

>>:  Accelerate the deployment of 6G, satellite Internet may become the key

Recommend

Are you ready for network automation?

[[374510]] This article is reprinted from the WeC...

The need for SD-WAN in a multi-cloud world

With the advent of a multi-cloud world, software-...

5G: Number of terminal connections exceeds 200 million

2020 is the first year of large-scale constructio...

Survey: Germany more dependent on Huawei 5G equipment than before

Germany is even more reliant on Huawei for its 5G...

How will the two major operators' competition to upgrade IPv6 affect me?

On May 21, at the 2018 Global Next Generation Int...

Whether to upgrade WiFi 6 routers depends on the consumer's own situation

Nowadays, remote home office and classes have bec...