Understanding TCP/IP protocol stack HTTP2.0

Understanding TCP/IP protocol stack HTTP2.0

[[332931]]

1 Introduction

Today, let's study some things about the Http protocol. Through this article, you will learn the following:

  • Comparison and advantages and disadvantages of various versions of HTTP protocol
  • The basic principles of Http2.0 protocol, such as SPDY protocol, binary framing protocol, multiplexing, header compression, and service push

Let's ride the wind and waves to the ocean of knowledge. Captain Dabai is about to set sail!

2. Comparison of HTTP protocol versions

The Http Hypertext Transfer Protocol is like air. You can't feel its existence but it is everywhere. The author extracted some simple information about the development of the Http protocol from Wikipedia. Let's take a look:

The Hypertext Transfer Protocol is an application protocol for distributed collaborative hypermedia information systems. The Hypertext Transfer Protocol is the basis for data communications on the World Wide Web, where hypertext documents include hyperlinks to other resources that users can easily access.

Tim Berners-Lee initiated the development of the Hypertext Transfer Protocol at CERN in 1989. The development of the early Hypertext Transfer Protocol Requests for Comments (RFCs) was a joint effort of the Internet Engineering Task Force (IETF) and the World Wide Web Consortium (W3C), with the work later transferred to the IETF.

Introduction to Tim Berners-Lee, the Father of the World Wide Web

Tim Berners-Lee is a British engineer and computer scientist, best known as the inventor of the World Wide Web. He is a professor of computer science at Oxford University and a professor at MIT.

He proposed an information management system on March 12, 1989, and then realized the first successful communication between a Hypertext Transfer Protocol HTTP client and a server through the Internet in mid-November of the same year.

He is the head of the World Wide Web Consortium (W3C), which oversees the continued development of the Web. He is also the founder of the World Wide Web Foundation. He is also the 3Com Founding Chairman and Senior Fellow at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). He is also the Director of the Web Science Research Initiative (WSRI) and a member of the Advisory Board of the MIT Center for Collective Intelligence. He is also the founder and president of the Open Data Institute and is currently an advisor to the social network MeWe.

In 2004, Berners-Lee was knighted by Queen Elizabeth II for his groundbreaking work. In April 2009, he was elected as a foreign fellow of the National Academy of Sciences of the United States, listed in Time magazine's list of the 100 most important people of the 20th century, and was hailed as the "inventor of the World Wide Web" and won the 2016 Turing Award.

[[332932]]

Basic information about each version of http

After more than 20 years of evolution, the HTTP protocol has five major versions: 0.9, 1.0, 1.1, 2.0, and 3.0. The author drew a picture for you to see:

A.Http0.9 version

0.9 is the original version, and its main features include:

  • The request method support is limited. Only the GET request method is supported. Other request methods are not supported. Therefore, the amount of information transmitted from the client to the server is very limited. That is, the commonly used Post request cannot be used.
  • The request header cannot be used to specify a version number in the request. The server can only return an HTML string.
  • Response means closing the TCP connection immediately after the server responds

B.Http1.0 version

Version 1.0 is mainly an enhancement of version 0.9, and the effect is quite obvious. The main features and shortcomings include:

  • Rich request methods New request methods such as POST, DELETE, PUT, and HEADER have been added to increase the amount of information sent from the client to the server.
  • Adding request headers and response headers The concept of request headers and response headers has been added. The HTTP protocol version number and other header information can be specified in the communication, making C/S interaction more flexible and convenient
  • Enriched data transmission content and expanded the transmission content format, including: pictures, audio and video resources, binary, etc. can be transmitted. Compared with 0.9, which can only transmit HTML content, HTTP has more application scenarios.
  • Poor link reusability In version 1.0, each TCP connection can only send one request. Once the data is sent, the connection is closed. If you want to request other resources, you must re-establish the connection. In order to ensure correctness and reliability, TCP requires the client and server to have three handshakes and four handshakes. Therefore, the cost of establishing a connection is very high. The sending rate is slow at the beginning of congestion control, so the performance of version 1.0 is not ideal.
  • Disadvantages of stateless and connectionless: Version 1.0 is stateless and connectionless. In other words, the server does not track or record the status of requests. The client needs to establish a TCP connection for each request and cannot reuse it. In addition, 1.0 stipulates that the next request can only be sent after the response to the previous request arrives. If the previous request is blocked, the subsequent request will be blocked. Packet loss and disorder problems and the high cost of the connection process cause many problems with multiplexing and head-of-line blocking, so connectionless and stateless is a weak point of version 1.0.

C.Http1.1 version

Version 1.1 was released about a year after version 1.0. It is an optimization and improvement of version 1.0. The main features of version 1.1 include:

  • Added a new Connection field for long connections. You can set the keep-alive value to keep the connection open. That is, the TCP connection is not closed by default and can be reused by multiple requests. This is also a very important optimization in version 1.1. However, the S-side server will only respond to the next response after processing one response. If the previous response is particularly slow, there will be many requests waiting in line, and there will still be a head-of-line blocking problem.
  • Based on the long connection, pipelining can continue to send subsequent requests without waiting for the response to the first request, but the order of responses is still returned in the order of requests. That is, in the same TCP connection, the client can send multiple requests at the same time, further improving the transmission efficiency of the HTTP protocol.
  • More request methods have been added, including PUT, PATCH, OPTIONS, and DELETE.
  • The host field is used to specify the domain name of the server, so that multiple requests can be sent to different websites on the same server, improving the reuse of the machine, which is also an important optimization.

D.Http2.0 version

Version 2.0 is a milestone version. Compared with version 1.x, it has many optimizations to adapt to the current network scenarios. Some important features include:

  • Binary format 1.x is a text protocol, but 2.0 is based on binary frames as the basic unit. It can be said to be a binary protocol. All transmitted information is divided into messages and frames, and encoded in binary format. A frame contains data and identifiers, making network transmission efficient and flexible.
  • Multiplexing is a very important improvement. In 1.x, there were problems with the cost and efficiency of establishing multiple connections. In version 2.0, multiple requests share one connection. Multiple requests can be concurrently connected on one TCP connection, mainly using the identifiers in the binary frames to distinguish and realize link multiplexing.
  • Header Compression Version 2.0 uses the HPACK algorithm to compress header data, thereby reducing the size of the request and improving efficiency. This is very easy to understand. Previously, the same header had to be sent each time, which seemed redundant. Version 2.0 incrementally updates the header information, effectively reducing the transmission of header data.
  • The server push feature is quite interesting. In previous 1.x versions, the server executed passively after receiving the request. In version 2.0, the server is allowed to actively send resources to the client, which can accelerate the client.

3 Http2.0 Detailed Explanation

We have compared the evolution and optimization processes of several versions. Next, we will take a deep look at some of the features of version 2.0 and its basic implementation principles.

In comparison, version 2.0 is not an optimization of version 1.1 but an innovation, because 2.0 carries more performance target tasks. Although 1.1 adds long connections and pipelining, it does not fundamentally achieve true high performance.

The design goal of 2.0 is to provide users with a faster, simpler, and safer experience while being compatible with 1.x semantics and operations, and to efficiently utilize the current network bandwidth. To this end, 2.0 has made many adjustments, mainly including: binary framing, multiplexing, header compression, etc.

Akamai compared the loading effects of http2.0 and http1.1 (the loading time of 379 small fragments on my computer was 0.99s VS 5.80s in the experiment):

https://http2.akamai.com/demo

3.1 SPDY Protocol

To talk about the 2.0 version standard and new features, we must mention Google's SPDY protocol. Take a look at Baidu Encyclopedia:

SPDY is a TCP-based session layer protocol developed by Google to minimize network latency, increase network speed, and optimize the user's network experience. SPDY is not a protocol to replace HTTP, but an enhancement of the HTTP protocol.

The new protocol features include data stream multiplexing, request prioritization, and HTTP header compression. Google said that after the introduction of the SPDY protocol, page loading speeds in lab tests were 64% faster than before.

Subsequently, the SPDY protocol was supported by major browsers such as Chrome and Firefox, and was deployed on some large and small websites. This efficient protocol attracted the attention of the HTTP working group, and the official Http2.0 standard was formulated on this basis.

In the following years, SPDY and Http2.0 continued to evolve and promote each other. Http2.0 allowed servers, browsers, and website developers to have a better experience with the new protocol and was quickly recognized by the public.

3.2 Binary Framing Layer

The binary framing layer redesigns the encoding mechanism without changing the request method and semantics. The figure shows the http2.0 layered structure (picture from reference 4):

The binary encoding mechanism enables communication to take place over a single TCP connection that remains active for the duration of the conversation.

The binary protocol breaks down the communication data into smaller frames. The data frames are filled in the bidirectional data flow between the client and the server, just like a two-way multi-lane highway with constant flow of traffic:

To understand the binary framing layer, you need to know four concepts:

  • Link refers to a TCP link between C/S, which is a basic link data highway
  • Data stream Stream is a bidirectional byte stream within an established TCP connection. The TCP link can carry one or more messages.
  • Message Message belongs to a data stream. A message is a complete series of frames corresponding to a logical request or response message. That is, frames constitute a message.
  • Frame is the smallest unit of communication. Each frame contains a frame header and a message body, which identifies the data stream to which the current frame belongs.

The four are one-to-many inclusion relationships. The author drew a picture:

Let's take a look at the structure of the HeadersFrame header frame:

Let's take a look at the structure of the HeadersFrame header frame: from each field, you can see the length, type, flag, stream identifier, data payload, etc. If you are interested, you can read the relevant rfc7540 documents.

  1. https://httpwg.org/specs/rfc7540.html

In short, version 2.0 breaks down communication data into binary coded frames for exchange. Each frame corresponds to a specific message in a specific data stream. All frames and streams are multiplexed within a TCP connection. The binary framing protocol is an important foundation for other functions and performance optimizations of 2.0.

3.3 Multiplexing

There is a head-of-line blocking problem in version 1.1. Therefore, if the client wants to initiate multiple parallel requests to improve performance, it must use multiple TCP connections, which will incur greater delays and link establishment and teardown costs, and cannot effectively utilize TCP links.

The use of a new binary framing protocol in version 2.0 breaks through many limitations of version 1.0 and fundamentally achieves true request and response multiplexing.

The client and server break down the interactive data into independent frames, transmit them interleavedly without affecting each other, and finally reassemble them at the other end based on the stream identifier in the frame header, thereby achieving multiplexing of the TCP link.

The figure shows the frame-based message communication process of version 2.0 (picture from reference 4):

3.4 Header Compression

A.Header redundant transmission

We all know that HTTP requests have a header part. Each packet has one and most packets have the same header part for a link. In this case, it is really a waste to transmit the same part every time.

In the modern network, each web page contains an average of more than 100 http requests, and each request header has an average of 300-500 bytes, with a total data volume of more than tens of KB. This may cause data delays, especially in complex WiFi environments or cellular networks. In this case, you can only see the phone spinning in circles, but there is usually almost no change between these request headers. It is indeed not an efficient approach to transmit the same data part multiple times in an already crowded link.

The congestion control designed based on TCP has the AIMD characteristic. If packet loss occurs, the transmission rate will drop significantly. In a crowded network environment, a large packet header means that the low-speed transmission caused by congestion control will be aggravated.

B.Http compression and criminal attacks

Before the 2.0 version of the HPACK algorithm, http compression used gzip. The later proposed SPDY algorithm made special designs for Headers, but it still used the DEFLATE algorithm.

In some subsequent practical applications, it was found that both DEFLATE and SPDY are vulnerable to attacks. Because the DEFLATE algorithm uses backward string matching and dynamic Huffman coding, attackers can control part of the request header by modifying the request part and then see how much the size changes after compression. If it becomes smaller, the attacker knows that the injected text is repeated in some content of the request.

This process is a bit like the elimination process of Tetris. After a period of attempts, the data content may be completely figured out. Due to the existence of this risk, safer compression algorithms have been developed.

C.HPACK algorithm

In version 2.0, the HPACK algorithm uses a header table in the C/S to store previously sent key-value pairs. For common key-value pairs that hardly change during the same data communication, they only need to be sent once.

In extreme cases, if the request header does not change each time, the header is not included in the transmission, that is, the header overhead is zero bytes. If the header key-value pair changes, only the changed data needs to be sent, and the newly added or modified header frame will be appended to the header table. The header table always exists during the life of the connection and is updated and maintained by the client and server.

Simply put, the client and server jointly maintain a key-value structure. When changes occur, they are updated and transmitted, otherwise they are not transmitted. This is equivalent to the initial full transmission followed by incremental updates and transmissions. This idea is also very common in daily development, so don't think too much about it.

The figure shows the update process of the header table (picture from reference 4):

Related documents of hpack algorithm:

  1. https://tools.ietf.org/html/draft-ietf-httpbis-header-compression-12

3.5 Server Push

Server push is a powerful new feature added in version 2.0. Different from the general question-and-answer C/S interaction, in push-based interaction, the server can send multiple responses to a client request. In addition to the response to the initial request, it also pushes additional resources to the client without the client's explicit request.

For example:

Imagine that you go to a restaurant to eat. A fast food restaurant with good service will provide you with napkins, chopsticks, spoons and even seasonings after you order a bowl of beef noodles. Such proactive service saves guests’ time and improves the dining experience.

This method of actively pushing additional resources is very effective in actual C/S interactions, because almost every network application contains multiple resources, and the client needs to obtain them all one by one. At this time, if the server pushes these resources in advance, it can effectively reduce the additional delay time, because the server can know what resources the client will request next.

The following figure shows the simple process of server push (picture from reference 4):

4. Conclusion

This article introduces the historical evolution of the HTTP protocol, the main features and advantages and disadvantages of each version, and focuses on some features of the HTTP 2.0 protocol, including: SPDY protocol, binary framing protocol, multiplexing, header compression, server push and other important functions. Due to limited space, I cannot expand on it too much.

Although the http2.0 version of the protocol has many excellent features and was officially released in 2015, and some major manufacturers at home and abroad now basically use http2.0 to handle some requests, it is still not widely popular.

Currently, the http3.0 version was launched in 2018. As for the promotion and popularization of http2.0 and http3.0, it will take time, but we firmly believe that our network can be safer, faster and more economical.

<<:  5G standards usher in new upgrades, driving development into a new stage

>>:  5G, edge computing and the Industrial Internet of Things

Recommend

Predictions from global telecom industry experts for 2024

Predictions from global telecom industry experts ...

TCP SYN Queue and Accept Queue

First we must understand that a TCP socket in the...

Still don’t understand routing strategy? Let’s analyze it!

For IP network engineers, the deployment of routi...

[6.18] RackNerd: $17.88/year KVM-1.8GB/18GB/5TB/Los Angeles Data Center

RackNerd has released a special package for the 6...

2021 Information and Communication Industry Events

ICT industry recovers According to statistics fro...

Satellite Internet or 5G, which is cheaper?

Just as a manned spacecraft was sent into space, ...

Ready to use right out of the box? StreamNative Platform 1.0 is now available

Recently, StreamNative solemnly announced the rel...

6 AI Elements You Need for a Wireless Network Strategy

Thanks to advances in artificial intelligence (AI...

Juniper Networks: AI empowers experience first

In the era of the Internet of Everything, with th...

Comparison between MQTT and SSE

Building a real-time web or mobile application is...