A classic interview question is what happens from the time a URL is input into a browser to the time a page is displayed. Most answers ask how the DOM is constructed and drawn after the request is responded to. But have you ever thought about how these images are downloaded if the received HTML contains dozens of image tags, in what order, how many connections are established, and what protocol is used? To understand this problem, we need to solve the following five problems:
Let's talk about the first question first: After a modern browser establishes a TCP connection with the server, will it disconnect after an HTTP request is completed? Under what circumstances will it disconnect? In HTTP/1.0, a server will disconnect the TCP connection after sending an HTTP response. However, each request will re-establish and disconnect the TCP connection, which is too costly. Therefore, although it is not set in the standard, some servers support the Connection: keep-alive Header. This means that after completing the HTTP request, do not disconnect the TCP connection used by the HTTP request. The advantage of this is that the connection can be reused, and there is no need to re-establish the TCP connection when sending HTTP requests later. If the connection is maintained, the SSL overhead can also be avoided. The two pictures are the time statistics of my two visits to github.com in a short period of time: The first visit has initial connection and SSL overhead The initial connection and SSL overhead disappears, indicating that the same TCP connection is used Persistent connection: Since maintaining a TCP connection has so many benefits, HTTP/1.1 includes the Connection header in the standard and enables persistent connections by default. Unless the request specifies Connection: close, the TCP connection between the browser and the server will be maintained for a period of time and will not be disconnected when a request is completed. So the answer to the first question is: by default, an established TCP connection will not be disconnected. The connection will only be closed after the request is completed if Connection: close is declared in the request header. Second question: How many HTTP requests can one TCP connection correspond to? After understanding the first question, in fact, this question already has an answer. If the connection is maintained, one TCP connection can send multiple HTTP requests. The third question: Can HTTP requests be sent together in one TCP connection (for example, sending three requests together and receiving three responses together)? There is a problem with HTTP/1.1. A single TCP connection can only process one request at a time. This means that the lifecycles of two requests cannot overlap, and the start and end time of any two HTTP requests cannot overlap in the same TCP connection. Although Pipelining is specified in the HTTP/1.1 specification to try to solve this problem, this feature is turned off by default in browsers. Let's first take a look at what Pipelining is. RFC 2616 stipulates:
A client that supports persistent connections can send multiple requests in one connection (without waiting for a response to any request). The server that receives the request must send the response in the order in which the requests were received. As for why the standard is set this way, we can roughly speculate one reason: HTTP/1.1 is a text protocol, and the returned content cannot distinguish which request it corresponds to, so the order must be consistent. For example, if you send two requests to the server, GET /query?q=A and GET /query?q=B, and the server returns two results, the browser has no way to determine which request the response corresponds to based on the response results. Pipelining is a good idea, but there are many problems in practice:
Therefore, modern browsers do not enable HTTP Pipelining by default. However, HTTP2 provides the Multiplexing feature, which can complete multiple HTTP requests simultaneously in one TCP connection. As for how Multiplexing is implemented, that is another question. Let's take a look at the effect of using HTTP2. Green is the waiting time from initiating the request to the request returning, and blue is the download time of the response. You can see that they are all completed in parallel on the same Connection. So this question has an answer: In HTTP/1.1, there is Pipelining technology that can complete the sending of multiple requests at the same time, but since it is disabled by default in browsers, it can be considered infeasible. In HTTP2, due to the Multiplexing feature, multiple HTTP requests can be performed in parallel in the same TCP connection. So how do browsers improve page loading efficiency in the HTTP/1.1 era? There are two main reasons:
Fourth question: Why sometimes refreshing the page does not require re-establishing the SSL connection? The answer has been given in the discussion of the first question. Sometimes the TCP connection will be maintained for a period of time by the browser and the server. TCP does not need to be re-established, and SSL will naturally use the previous one. Question 5: Is there any limit on the number of TCP connections that a browser can establish to the same host? Assuming we are still in the HTTP/1.1 era, when there was no multi-channel transmission, what should the browser do when it gets a web page with dozens of pictures? It certainly cannot just open a TCP connection to download them sequentially, as that would make the user wait uncomfortably. However, if a TCP connection is opened for each picture to send an HTTP request, the computer or server may not be able to handle it. If there are 1,000 pictures, you cannot open 1,000 TCP connections, and your computer may not agree even if NAT is used. So the answer is: Yes. Chrome allows up to six TCP connections to the same host. There are some differences between different browsers. So back to the original question, if the received HTML contains dozens of image tags, how are these images downloaded, in what order, how many connections are established, and what protocol is used? If all images are HTTPS connections and under the same domain name, then after the SSL handshake, the browser will negotiate with the server whether HTTP2 can be used. If it can, it will use the Multiplexing function to multiplex the connection. However, it is not necessarily the case that all resources on this domain name will be obtained using a TCP connection, but it is certain that Multiplexing will most likely be used. What if you find that you cannot use HTTP2? Or you cannot use HTTPS (in reality, HTTP2 is implemented on HTTPS, so you can only use HTTP/1.1). Then the browser will establish multiple TCP connections on a HOST. The maximum number of connections depends on the browser settings. These connections will be used by the browser to send new requests when they are idle. What if all connections are sending requests? Then other requests can only wait. |
<<: In the 5G era, virtual operators “disappear”
>>: TCP/IP protocol is used to transmit onions? This article will give you a deeper understanding
One of the most important lessons that businesses...
CUBECLOUD has launched a limited-time promotion d...
[[423089]] Smart substations will be installed be...
[[384899]] This article is reprinted from the WeC...
2023 has officially begun, and RAKsmart has launc...
Author | NK Planning | Yan Zheng February 2005, C...
Since 1991, Zhongchuang Software Engineering Co.,...
Yecaoyun has announced a huge discount event for ...
In addition to the recharge gift event, RAKsmart ...
DMIT.io is a foreign hosting company founded in 2...
[[419885]] What's the bandwidth of your home ...
Communications operators must refocus on covering...
The 5G standard is composed of many technologies,...
2020 is a critical year for my country's 5G c...
Hu Jianbo, chief engineer of the China Academy of...