A brief history of the development of the HTTP protocol and analysis of common interview questions

A brief history of the development of the HTTP protocol and analysis of common interview questions

[[375750]]

This article is reprinted from WeChat public account "Java Big Factory Interviewer", author laker. Please contact the WeChat public account of Java Big Factory Interviewer to reprint this article.

Table of contents

  • What is HTTP protocol
  • A brief history of HTTP protocol development
    • HTTP 0.9 One-line protocol
    • HTTP 1.0 builds extensibility
    • HTTP 1.1 Standardized Protocol
    • HTTP 2.0 A higher performance protocol
  • question
  • 1. What is the difference between http1.1 keep-alive and http2.0 multiplexing?
  • 2. What is a pipeline?
  • 3. After a modern browser establishes a TCP connection with a server, will it disconnect after an HTTP request is completed?
  • 4.How many HTTP requests can a TCP connection correspond to?
  • 5. Can HTTP requests be sent together in one TCP connection (for example, sending three requests together and receiving three responses together)?

What is HTTP protocol

The Hypertext Transfer Protocol (HTTP) is one of the most ubiquitous and widely adopted application protocols on the Internet: it is the common language between clients and servers that enables the modern Web. From simple beginnings with a single keyword and a document path, it has become the protocol of choice not only for browsers, but for nearly all Internet-connected software and hardware applications.

HTTP has four versions:

  • HTTP/0.9
  • HTTP/1.0
  • HTTP/1.1
  • HTTP/2.0

Today, the commonly used version is HTTP/1.1, and the future development version is HTTP/2.0.

A brief history of HTTP protocol development

HTTP 0.9 One-line protocol

The first simple implementation of the HTTP protocol only supports fetching web pages. There was no version number at the beginning, and it was later called 0.9 to distinguish it from other versions. HTTP/0.9 is very simple: the request consists of one line and starts with the only method GET, followed by the path to the resource.

  1. GET /mypage.html
  2.  
  3. <HTML>
  4. A very simple HTML page
  5. </HTML>

HTTP has taken on a life of its own since 1991 and has grown rapidly over the following years.

It’s fate❤️, I was also born in 1991.

Core features:

  • Simple client-server, request-response protocol.
  • Supported methods: GET only.
  • ASCII protocol, running over a TCP/IP link.
  • Designed for transmitting hypertext documents (HTML).
  • After each request, the connection between the server and the client is closed.
  • No HTTP headers (cannot transfer other content type files), no status/error codes, no URLs, no versioning

I mean I live in modern times, and I haven't encountered anything like what I've experienced so far... Just skip it and don't study it👄

HTTP 1.0 builds extensibility

The HTTP/0.9 protocol was very limited, and browsers and servers were quickly adding extensibility to make it more general.

In May 1996, the HTTP Working Group (HTTP-WG) published RFC 1945, which added a number of additional data fields, called canonical headers. This allows additional information to be passed between the client and the server and between a request and subsequent pages.

  1. GET /mypage.html HTTP/1.0
  2. User -Agent: NCSA_Mosaic/2.0 (Windows 3.1)
  3.  
  4. 200 OK
  5. Date : Tue, 15 Nov 1994 08:12:31 GMT
  6. Server: CERN/3.0 libwww/2.17
  7. Content-Type: text/html
  8. <HTML>
  9. A page with an image
  10. <IMG SRC= "/myimage.gif" >
  11. </HTML>

Core features:

  • The provided header fields include rich metadata about the request and response (HTTP version number, status code, content type)
  • Response: Not limited to hypertext (the Content-Type header provides the ability to transfer files other than normal HTML files, such as scripts, style sheets, media)
  • Supported methods: GET, HEAD, POST
  • A request may contain multiple newline-delimited header fields.
  • Response objects are prefixed with a response status line.
  • The response object has its own set of header fields separated by newline characters.
  • After each request, the connection between the server and the client is closed.

Nginx still uses the http1.0 protocol by default. I encountered a problem a few days ago [1.0 does not support block transmission]. For details, please refer to:

Nginx download file upstream sent invalid chunked response while reading upstream error

HTTP 1.1 Standardized Protocol

The HTTP/1.1 standard resolved many of the protocol ambiguities in earlier versions and introduced a number of key performance optimizations:

  • Keep-alive connection
  • Chunked Encoding Transmission
  • Byte range requests
  • Additional caching mechanism
  • Transfer Encoding
  • Request pipeline (pipeline mechanism)

Today, most browsers support both 1.0 and 1.1 implementations, with new browsers using 1.1 by default, but being able to fall back to earlier versions if needed. One thing the RFC definition clearly states is that all implementations of the HTTP protocol should be backward compatible. That is, a browser that implements the HTTP/1.1 specification should be able to receive a 1.0 response from a server. Conversely, a server-side 1.1 implementation should also be able to respond to requests from a 1.0 browser.

The work to turn HTTP into a formal IETF Internet standard took place in parallel with the documentation work around HTTP/1.0 and lasted for about four years: from 1995 to 1999.

In fact, the first official HTTP/1.1 standard was RFC 2068, which was officially published in January 1997, about six months after the release of HTTP/1.0. Then, two and a half years later, in June 1999, many improvements and updates were incorporated into the standard and published as RFC 2616.

  1. GET/ static /img/header-background.png HTTP/1.1
  2. Host: developer.cdn.mozilla.net
  3. User -Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:50.0) Gecko/20100101 Firefox/50.0
  4. Accept: */*
  5. Accept-Language: en-US,en;q=0.5
  6. Accept-Encoding: gzip, deflate, br
  7. Referer: https://developer.mozilla.org/en-US/docs/Glossary/Simple_header
  8.  
  9. 200 OK
  10. Age: 9578461
  11. Cache-Control: public , max -age=315360000
  12. Connection : keep-alive
  13. Content-Length: 3077
  14. Content-Type: image/png
  15. Date : Thu, 31 Mar 2016 13:34:46 GMT
  16. Last -Modified: Wed, 21 Oct 2015 18:27:50 GMT
  17. Server: Apache
  18.  
  19. (image content of 3077 bytes)

Core features:

  • Connections can be reused, saving the time of reopening the connection multiple times.
  • Pipelining was added, allowing a second request to be sent before the answer to the first request has been fully transmitted, thus reducing communication latency.
  • Chunked encoding transfers are now also supported.
  • Other cache control mechanisms have been introduced.
  • Content negotiation, including language, encoding or type, was introduced and enables the client and server to reach a consensus on the most appropriate content.
  • The ability to host different domains on the same IP address now allows server hosting.

HTTP/1.1 changed the semantics of the HTTP protocol to use long connections by default. This means that, unless otherwise specified (via the Connection: close header), the server should keep the connection open by default.

However, this functionality has also been backported to HTTP/1.0 and is enabled via the Connection: Keep-Alive header. So if you are using HTTP/1.1, technically you don't need the Connection: Keep-Alive header, but many clients still choose to provide it.

Since 2005, the set of APIs available for web pages has grown significantly, some of which create HTTP protocol extensions for specific purposes, mainly new specific HTTP headers:

  • Server-Sent Events, a server can push occasional messages to the browser.
  • WebSocket is a new protocol that can be set up by upgrading an existing HTTP connection.

Here you can refer to the system design basics I wrote before, long polling, WebSocket, and server-sent events (SSEs) protocol

Differences between short and long connections between HTTP/1.0 and HTTP/1.1

insert image description here

In the figure above, HTTP1.1 on the right establishes a long connection, and the TCP handshakes in the middle are omitted.

HTTP pipelining and multiple parallel connections

HTTP pipelining, multiple connections and many more improvements have been implemented thanks to the behavior of the Keep-Alive header.

insert image description here

HTTP 2.0 A higher performance protocol

Over the years, web pages have become more complex, even becoming standalone applications. The amount of visual media displayed, the number and size of scripts adding interactivity have also increased: more data is transferred via significantly more HTTP requests. HTTP/1.1 connections require requests to be sent in the correct order. In theory, several parallel connections can be used (usually between 5 and 8), introducing considerable overhead and complexity. For example, HTTP pipelining has become a resource burden in web development.

In the first half of the 2010s, Google demonstrated an alternative way to exchange data between clients and servers by implementing the experimental protocol SPDY. This sparked the interest of developers working with both browsers and servers. SPDY defines an increase in responsiveness and solves the problem of duplication of transmitted data, and is the basis for the HTTP/2 protocol.

The HTTP/2 protocol has several major differences from the HTTP/1.1 version:

  • It is a binary protocol, not text. It is no longer possible to read and create it manually. Despite this obstacle, improved optimization techniques can now be implemented.
  • It is a multiplexing protocol that can handle parallel requests on the same connection, eliminating the ordering and blocking constraints of the HTTP/1.x protocol.
  • Compress headers. Since these requests are often similar across a group of requests, this eliminates the overhead of duplication and transmission of data.
  • It allows the server to populate data in the client cache before it is needed, through a mechanism called server push.

How requests and responses happen in parallel

The photo above shows how requests and responses happen in parallel. It also shows how multiple requests/responses can be split into separate frames and sent one by one in an asynchronous manner.

After being formally standardized in May 2015, HTTP/2 has achieved great success. By July 2016, 8.7% of all websites were using it, accounting for more than 68% of all requests. High-traffic websites adopted it the fastest, saving a lot of data transfer overhead and subsequent budgets.

This rapid rate of adoption is likely due to the fact that HTTP/2 does not require adaptation of websites and applications: using HTTP/1.1 or HTTP/2 is transparent to them. Using up-to-date servers communicating with up-to-date browsers is enough to enable it: only a limited set of groups is needed to trigger adoption, and as older browser and server versions are updated, usage increases naturally, without requiring additional effort from web developers.

Http2.0 must be based on TLS, which means it must be an Https request.

question

1. What is the difference between http1.1 keep-alive and http2.0 multiplexing?

  • http1.1 keep-alive does not close the TCP connection, that is, a long connection;
    • Without using the pipeline mechanism, the interaction is simplex, that is, the client must wait for the response to the previous request to return before sending a new request.
    • When using the pipeline mechanism, request sending can be non-blocking, but the response return must still be strictly in the order of the requests.

HTTP2.0 multiplexing is based on streams, so when transmitting, whether it is a request or a response, it can be transmitted as long as it is logically allowed. If two requests have no dependency, they can be sent directly without waiting for the previous one to return, even though the same connection is used.

2. What is a pipeline?

By default, HTTP requests are sent sequentially. The next request is sent only after the current request receives a response. Due to network latency and bandwidth limitations, it may take a long time before the next request is sent to the server.

Pipelining is sending consecutive requests on the same long connection without waiting for the response to be returned. This can avoid connection delays. In theory, performance will also be improved because two HTTP requests may be packaged into one TCP message packet. Even if the HTTP requests continue to increase in size, setting the TCP MSS (Maximum Segment Size) option is still enough to include a series of simple requests.

Not all types of HTTP requests can be pipelined: only idempotent methods such as GET, HEAD, PUT and DELETE can be safely retried: if a failure occurs, the content of the pipeline should be able to be easily retried.

Today, all HTTP/1.1-compliant proxies and servers should support pipelining, although there are still many limitations in practice: one important reason is that no browser currently enables this feature by default.

The problem RFC 2616 states:

A client that supports persistent connections can send multiple requests in one connection (without waiting for a response to any request). The server that receives the request must send the response in the order in which the requests were received.

As for why the standard is set this way, we can roughly speculate one reason: HTTP/1.1 is a text protocol, and the returned content cannot distinguish which request it corresponds to, so the order must be consistent. For example, if you send two requests to the server, GET/query?q=A and GET/query?q=B, and the server returns two results, the browser has no way to determine which request the response corresponds to based on the response results.

Pipelining is a good idea, but there are many problems in practice:

  • Some proxy servers do not handle HTTP Pipelining correctly.
  • Correct pipelining implementation is complex.
  • Head-of-line Blocking: After a TCP connection is established, suppose the client sends several requests to the server in succession. According to the standard, the server should return the results in the order in which the requests were received. If the server spends a lot of time processing the first request, then all subsequent requests will need to wait for the first request to complete before responding.

Therefore, modern browsers do not enable HTTP Pipelining by default.

For these reasons, pipelining has been replaced by better algorithms, such as multiplexing, which is used in HTTP/2.

HTTP1.X connection management

  • Short link on the left
  • Middle long connection
  • Right Pipe Mechanism

insert image description here

3. After a modern browser establishes a TCP connection with a server, will it disconnect after an HTTP request is completed?

In fact, if you read the above content, you will know the answer. Modern browsers use HTTP1.1 protocol by default, and long connections will not be disconnected. Let's verify it: Chrome browser F12, the result of visiting a website twice: The first time:

Second time:

Result analysis:

The initial connection and SSL overhead disappears, indicating that the same TCP connection is used

4.How many HTTP requests can one TCP connection correspond to?

One TCP connection can send multiple HTTP requests.

This can be proven from the screenshot of question 3.

5. Can HTTP requests be sent together in one TCP connection (for example, sending three requests together and receiving three responses together)?

There is a problem with HTTP/1.1. A single TCP connection can only process one request at a time. This means that the lifecycles of two requests cannot overlap, and the start and end time of any two HTTP requests cannot overlap in the same TCP connection.

Although the HTTP/1.1 specification specifies Pipelining to try to solve this problem, this feature is disabled by default in browsers. The reason for this has been explained in question 2.

HTTP2 provides the Multiplexing feature, which can complete multiple HTTP requests simultaneously in one TCP connection.

refer to:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Evolution_of_HTTP

http://qnimate.com/what-is-multiplexing-in-http2/

https://developer.mozilla.org/zh-CN/docs/Web/HTTP/Connection_management_in_HTTP_1.x

https://medium.com/platform-engineer/evolution-of-http-69cfe6531ba0

https://blog.csdn.net/ywlmsm1224811/article/details/96436768

<<:  On-Prem vs. Colocation vs. Cloud vs. Edge: Pros and Cons

>>:  Illustration | You call this a thread pool?

Recommend

Stop shouting slogans, how to implement IPv6? Operators give details

IPv6, which is "not fast enough to keep up w...

How does Netty solve the half-packet and sticky-packet problems?

Netty is a high-performance, asynchronous event-d...

How can enterprises ensure that SDN deployment is effective?

[[177483]] In recent years, companies ranging fro...

Scientists convert Wi-Fi signals into electricity to charge devices

Wireless charging, which we often refer to, gener...

In 2017, the cybersecurity industry says no to black production!

[51CTO.com original article] In Keigo Higashino&#...

Riverbed officially releases SaaS solutions for on- and off-cloud

The hottest word in the technology field in 2016 ...

Aruba: Modernizing the network to enable ubiquitous connectivity

Network edge is an inevitable trend, and user nee...

5G is here: Will 4G soon be relegated to the sidelines?

Will 4G, which once brought prosperity to the mob...