Soul-searching question: How many HTTP requests can be sent through a TCP connection?

Soul-searching question: How many HTTP requests can be sent through a TCP connection?

A classic interview question is what happens from the time a URL is entered into the browser to the time the page is displayed. Most answers ask how the DOM is constructed and drawn after the request is responded to.

[[286177]]

Image via Pexels

But have you ever thought about how, in what order, how many connections are established, and what protocol are used to download these images if the received HTML contains dozens of image tags?

To understand this problem, we need to solve the following five problems:

  • After a modern browser establishes a TCP connection with a server, will it disconnect after an HTTP request is completed? Under what circumstances will it disconnect?
  • How many HTTP requests can a TCP connection correspond to?
  • Can HTTP requests be sent together in one TCP connection (for example, three requests are sent together and three responses are received together)?
  • Why sometimes refreshing a page does not require re-establishing an SSL connection?
  • Is there any limit on the number of TCP connections that a browser can establish to the same host?

Let's talk about the first question first: After a modern browser establishes a TCP connection with the server, will it disconnect after an HTTP request is completed? Under what circumstances will it disconnect?

In HTTP 1.0, a server will disconnect the TCP connection after sending an HTTP response, but each request will re-establish and disconnect the TCP connection, which is too expensive.

Therefore, although it is not set in the standard, some servers support the Connection: keep-alive Header.

This means that after completing the HTTP request, do not disconnect the TCP connection used by the HTTP request.

The advantage of this is that the connection can be reused, and there is no need to re-establish the TCP connection when sending HTTP requests later. If the connection is maintained, the SSL overhead can also be avoided. The two pictures are the time statistics of my two visits to Github.com in a short period of time:

The first visit has initial connection and SSL overhead

The initial connection and SSL overhead disappears, indicating that the same TCP connection is used

Persistent connection: Since maintaining a TCP connection has so many benefits, HTTP 1.1 includes the Connection header in the standard and enables persistent connections by default.

Unless the request states Connection: close, the TCP connection between the browser and the server will be maintained for a period of time and will not be disconnected when a request is completed.

So the answer to the first question is: by default, an established TCP connection will not be disconnected. Only when Connection: close is declared in the request header will the connection be closed after the request is completed.

See the following link for detailed documentation:

  1. https://tools.ietf.org/html/rfc2616# section -8.1

Second question: How many HTTP requests can one TCP connection correspond to?

After understanding the first question, in fact, this question already has an answer. If the connection is maintained, a TCP connection can send multiple HTTP requests.

The third question: Can HTTP requests be sent together in one TCP connection (for example, sending three requests together and receiving three responses together)?

HTTP 1.1 has a problem. A single TCP connection can only process one request at a time. This means that the lifecycles of two requests cannot overlap. The start and end time of any two HTTP requests cannot overlap in the same TCP connection.

Although Pipelining is specified in the HTTP 1.1 specification to try to solve this problem, this feature is turned off by default in browsers.

Let's first take a look at what Pipelining is. RFC 2616 stipulates: A client that supports persistent connections MAY "pipeline" its requests (ie, send multiple requests without waiting for each response). A server MUST send its responses to those requests in the same order that the requests were received.

A client that supports persistent connections can send multiple requests in one connection (without waiting for a response to any request). The server that receives the request must send the response in the order in which the requests were received.

As for why the standard is set this way, we can roughly speculate one reason: since HTTP 1.1 is a text protocol, and the returned content cannot distinguish which request it corresponds to, the order must be maintained consistent.

For example, if you send two requests to the server, GET /query?q=A and GET /query?q=B, and the server returns two results, the browser has no way to determine which request the response corresponds to based on the response results.

Pipelining is a good idea, but there are many problems in practice:

  • Some proxy servers do not handle HTTP Pipelining correctly.
  • Correct pipelining implementation is complex.
  • Head-of-line Blocking: After establishing a TCP connection, suppose the client sends several requests to the server in succession on this connection.

According to the standard, the server should return results in the order in which the requests are received. Assuming that the server spends a lot of time processing the first request, all subsequent requests will need to wait until the first request is completed before they can respond.

Therefore, modern browsers do not enable HTTP Pipelining by default.

However, HTTP2 provides the Multiplexing feature, which can complete multiple HTTP requests simultaneously in one TCP connection. As for how to implement Multiplexing, that is another question.

Let's take a look at the effect of using HTTP2:

Green is the waiting time from initiating the request to the request returning, and blue is the download time of the response. You can see that they are all completed in parallel on the same Connection.

So this question has an answer: HTTP 1.1 has Pipelining technology that can complete the sending of multiple requests at the same time, but since it is turned off by default in browsers, it can be considered infeasible.

In HTTP2, due to the Multiplexing feature, multiple HTTP requests can be performed in parallel in the same TCP connection.

So how do browsers improve page loading efficiency in the HTTP 1.1 era? There are two main reasons:

  • Maintain the established TCP connection with the server and process multiple requests sequentially on the same connection.
  • Establish multiple TCP connections with the server.

Fourth question: Why sometimes refreshing the page does not require re-establishing the SSL connection?

The answer to the first question has already been discussed. Sometimes the TCP connection will be maintained for a period of time by the browser and the server. TCP does not need to be re-established, and SSL will naturally use the previous one.

Question 5: Is there any limit on the number of TCP connections that a browser can establish to the same host?

Assuming we are still in the HTTP 1.1 era, when there was no multi-channel transmission, what should the browser do when it gets a web page with dozens of images?

You certainly can't just open one TCP connection to download sequentially, because that would make the user wait uncomfortably. But if you open a TCP connection and send an HTTP request for each image, the computer or server might not be able to handle it.

If there are 1,000 pictures, you can't open 1,000 TCP connections, and your computer may not agree to NAT.

So the answer is: Yes. Chrome allows up to six TCP connections to the same host. There are some differences between different browsers.

  1. https://developers.google.com/web/tools/chrome-devtools/network/issues#queued- or -stalled-requestsevelopers.google.com

So back to the original question, if the received HTML contains dozens of image tags, how are these images downloaded, in what order, how many connections are established, and what protocol is used?

If the images are all HTTPS connections and are under the same domain name, then the browser will negotiate with the server whether HTTP2 can be used after the SSL handshake. If it can, it will use the Multiplexing function to perform multiple transmissions on this connection.

However, it is not necessarily the case that all resources on this domain name will be obtained using a TCP connection, but it is certain that Multiplexing will most likely be used.

What if you find that you cannot use HTTP2? Or you cannot use HTTPS (in reality, HTTP2 is implemented on HTTPS, so you can only use HTTP 1.1).

Then the browser will establish multiple TCP connections on a Host. The maximum number of connections depends on the browser settings. These connections will be used by the browser to send new requests when they are idle. What if all connections are sending requests? Then other requests can only wait.

<<:  Deepin Technologies was invited to attend the first Feiteng National Ecosystem Partner Conference

>>:  [LeaTech Summit Review] Red Hat Global Vice President Cao Hengkang: The secret of digital transformation lies in people "cooperation"

Recommend

The Three Realms of Industrial Internet

The Industrial Internet platform is now very popu...

Why is HTTPS protocol secure?

1. What is HTTPS protocol security? As we all kno...

A brief tutorial on the Dig command

Hello everyone, I am Xianyu. I don’t know how oft...

LiCloud: $16.99/year KVM-756MB/10GB/399GB/Hong Kong Data Center

In April, I shared information about LiCloud.io. ...

What? 5G early packages are released

The future is coming, and 5G is expected to be an...

Guangxi Maitong: We didn't miss Ruijie!

"I missed Lenovo 10 years ago, but I cannot ...