The HTTP protocol is very important to us programmers. No matter which language you use, HTTP is the key point you need to know. This is not an article that simply introduces the basic concepts of HTTP. If you are not very familiar with the basic concepts of HTTP, I recommend you to read the article about HTTP basics written by cxuan - "After reading this article on HTTP, you will have no problem arguing with the interviewer." So we assume that everyone here has some knowledge and understanding of HTTP. Let’s start this article.
TCP with HTTPAs we all know, HTTP, an application layer protocol, transmits data based on TCP. When you want to access a resource (a resource is a URL on the Internet), you need to first resolve the IP address and port number of the resource, and then establish a TCP connection with the server where the IP and port number are located. Then the HTTP client initiates a service request (GET) message, and the server responds to the server's request message. When there is no need to exchange messages, the client will close the connection. The following diagram illustrates this process very well. The above picture well illustrates the whole process of HTTP from establishing connection -> initiating request message -> closing connection, but the above process also ignores a very important point, that is, the process of TCP establishing connection. TCP needs to go through three handshakes and exchange three messages to establish a connection. I believe everyone is familiar with this process. If you are still unclear about the process of TCP establishing a connection, you can first read cxuan's article TCP connection management. Since HTTP is located on the upper layer of TCP, the timeliness (performance) of the HTTP request -> response process depends largely on the performance of the underlying TCP. Only after understanding the performance of TCP connections can we better understand the performance of HTTP connections and thus implement high-performance HTTP applications. We usually call a complete request -> response process an HTTP transaction. So I will usually write it as HTTP transaction later, just make sure you understand what is going on. Our next focus will start with the performance of TCP. HTTP latency lossLet's review the HTTP transaction process above. Which processes do you think will cause HTTP transaction delay? As shown in the following figure: As can be seen from the figure, the following factors mainly affect the latency of HTTP transactions:
The optimization of the last point is also a focus of this article. HTTP Connection ManagementImagine a problem. Suppose a page has five resources (elements). Each resource requires the client to open a TCP connection, obtain the resource, and disconnect. Moreover, each connection is opened serially, as shown in the following figure: Serial means that these five connections must be in sequence, and there will not be a situation where more than two connections are open at the same time. The five resources above require five connections to be opened. It is not a big deal if the resources are few, as the CPU can handle it. But what if the page resources reach hundreds or more? Is it necessary to open a separate connection for each resource? This will obviously increase the processing pressure of the CPU dramatically and cause a lot of delays, which is obviously unnecessary. Another disadvantage of serial is that some browsers cannot know the size of the object before the object is loaded, and the browser needs the object size information to place them in a reasonable position on the screen. Therefore, the screen will not display anything until enough objects are loaded. This will cause the object to be loaded all the time, but we think the browser is stuck. So, is there a way to optimize HTTP performance? That's a good question, of course there is. (1) Parallel connection This is the most common and easiest to think of connection method. HTTP allows the client to open multiple connections and execute multiple HTTP transactions in parallel. After adding parallel connections, the request process of the entire HTTP transaction is as follows. Using parallel connections will overcome the idle time and bandwidth limitations of a single connection. Because each transaction has a connection, the delays can overlap, which will increase the loading speed of the page. However, parallel connections are not necessarily fast. If the bandwidth is insufficient, the page response speed may even be worse than that of serial connections. This is because in parallel connections, each connection will compete for the use of effective bandwidth, and each object will be loaded at a slower speed. It is possible that connection 1 has loaded 95%, connection 2 has occupied 80% of the bandwidth, connection 3, connection 4... Although each object is loading, there is no response on the page. Moreover, opening a large number of connections will consume a lot of memory resources, thus causing performance problems. The five connections discussed above are relatively few. A complex web page may have dozens or even hundreds of embedded objects. In other words, the client can open hundreds of connections, and many clients send requests at the same time, which can easily become a performance bottleneck. It seems that parallel connections are not necessarily "fast". In fact, parallel connections do not speed up the transmission of pages. Parallel connections only create an illusion, which is a common problem of all parallel connections. (2) Persistent Connection Web clients often open connections to the same site, and an application that initiates a request to a server is likely to make more requests to the server in the near future, such as to fetch more images. This property is called site locality. Therefore, HTTP 1.1 and HTTP1.0 allow HTTP to keep the connection open after executing a transaction. This open state actually refers to the open state of TCP, so that the next HTTP transaction can reuse this connection. A TCP connection that remains open after an HTTP transaction ends is called a persistent connection. Non-persistent connections are closed after each transaction ends, whereas persistent connections remain open after each transaction ends. Persistent connections remain open between transactions until the client or server decides to close them. Long connections also have disadvantages. If a single client does not initiate requests very frequently, but there are many connected clients, the server will crash sooner or later. There are generally two options for persistent connections: HTTP 1.0 + keep-alive; HTTP 1.1 + persistent. The default connections of versions before HTTP 1.1 are all non-persistent connections. If you want to use persistent connections on older versions of HTTP, you need to specify the Connection value as Keep-Alive. HTTP 1.1 versions are all persistent connections. If you want to disconnect, you need to specify the Connection value as close. This is also the version factor of the two selection methods we mentioned above. The following is a comparison of HTTP transactions using persistent connections and using serial HTTP transaction connections: This figure compares the time loss of HTTP transactions on serial connections and persistent connections. It can be seen that HTTP persistent connections save the time of connection opening and connection closing, so the time loss is reduced. Another interesting aspect of persistent connections is the Connection option. Connection is a common option, which is a header that both the client and the server have. The following is a request-response diagram of a client and server with a persistent connection: As can be seen from this picture, the persistent connection mainly uses the Connection header, which means that Connection is the implementation method of persistent connection. So below we mainly discuss the Connection header. Connection headerThe Connection header has two purposes:
(1) Used together with Upgrade to upgrade the protocol HTTP provides a special mechanism that allows an established connection to be upgraded to a new protocol. The general syntax is as follows:
HTTP/2 explicitly prohibits the use of this mechanism, which belongs only to HTTP/1.1. That is to say, the client initiating Connection:upgrade indicates that this is a request for connection upgrade. If the server decides to upgrade this connection, it will return a 101 Switching Protocols response status code and an Upgrade header field of the protocol to be switched to. If the server does not (or cannot) upgrade this connection, it will ignore the Upgrade header field sent by the client and return a regular response: for example, return 200. (2) Managing persistent connections We said above that there are two ways of persistent connection, one is HTTP 1.0 + Keep-Alive; the other is HTTP 1.1 + persistent.
In HTTP 1.0 + Keep-Alive mode, the client can include the Connection: Keep-Alive header request to keep a connection open. One thing to note here: the Keep-Alive header only keeps the request active. After sending a Keep-Alive request, the client and server do not necessarily agree to a Keep-Alive session. They can close an idle Keep-Alive connection at any time, and the client and server can limit the number of transactions handled by a Keep-Alive connection. The Keep-Alive header has the following options:
Keep-Alive This header is optional, but can only be used if Connection: Keep-Alive is provided. There are certain limitations on the use of Keep-Alive. Let's discuss the limitations of the use of Keep-Alive. Keep-Alive Usage Limitations and Rules
Keep-Alive and the dumb proxy problemHere I will first explain what a proxy server is, and then talk about the dumb proxy problem. (1) What is a proxy server? A proxy server is a medium that obtains network information on behalf of the client. In layman's terms, it is a transit station for network information. (2) Why do we need a proxy server? The most common use is that we need to use a proxy server to access some websites that our clients cannot directly access. In addition, the proxy server has many other functions, such as caching, which can reduce costs and save bandwidth; real-time monitoring and filtering of information. The proxy server is also a client relative to the target server (the server that ultimately obtains the information). It can obtain the information provided by the server. The proxy server is a server relative to the client, and it is up to it to decide what information to provide to the client, so as to achieve the monitoring and filtering functions. The dumb proxy problem occurs on the proxy server. More specifically, it occurs on the proxy server that cannot recognize the Connection header and does not know that it will delete the Connection header after sending the request. Suppose a web client is talking to a web server through a dumb proxy server, as shown in the following diagram: Let me explain the picture above:
This is the Keep-Alive dumb proxy. So how do we solve this problem? Use Proxy-Connection Proxy-Connection solves dumb proxyNetscape proposed a method of using the Proxy-Connection header. First, the browser will send the Proxy-Connection extension header to the proxy instead of the officially supported Connection header. If the proxy server is a dumb proxy, it will directly send the Proxy-Connection to the server, and the server will ignore this header when it receives the Proxy-Connection, which will not cause any problems. If it is a smart proxy server, when it receives the Proxy-Connection, it will directly replace the Proxy-Connection with the Connection header and send it to the server. HTTP/1.1 Persistent ConnectionsHTTP/1.1 gradually stopped supporting Keep-Alive connections and replaced Keep-Alive with an improved design called persistent connection. This improved design is also a persistent connection, but it has a better working mechanism than HTTP/1.0. Unlike the Keep-Alive connection of HTTP/1.0, HTTP/1.1 uses a persistent connection by default. Unless otherwise specified, HTTP/1.1 assumes that all connections are persistent. If you want to close the connection after the transaction ends, you need to explicitly add a Connection: close header to the message. This is a very important difference from previous HTTP protocol versions. There are also some restrictions and rules for using persistent connections:
Pipeline ConnectionHTTP/1.1 allows the use of request pipelines on persistent connections. This is another performance optimization compared to Keep-Alive connections. A pipeline is a carrier that carries HTTP requests. We can put multiple HTTP requests into a pipeline, which can reduce the network round-trip time and improve performance. The following figure is a schematic diagram of using serial connection, parallel connection, and pipelined connection: There are several limitations to using pipelined connections:
HTTP Close ConnectionAll HTTP clients, servers, or proxies can close an HTTP transport connection at any time. Usually the connection is closed after a response, but it may also happen in the middle of an HTTP transaction. However, the server cannot be sure whether the client has any data to send at the moment of closing. If this happens, the client will have a write error during the data transmission process. Even if there is no error, the connection can be closed at any time. If the connection is closed during the transaction transmission, you need to reopen the connection and try again. If it is a single connection, it is not a big deal, but if it is a pipelined connection, it is worse, because the pipelined connection will throw a large number of connections in the pipeline. If the server is shut down at this time, a large number of connections will not respond and need to be rescheduled. If an HTTP transaction gets the same result whether it is executed once or n times, then we consider the transaction to be idempotent. Generally, GET, HEAD, PUT, DELETE, TRACE and OPTIONS methods are considered idempotent. The client should not send any non-idempotent request in a pipelined manner, such as POST, otherwise it will cause uncertain consequences. Since HTTP uses TCP as the transport layer protocol, closing an HTTP connection is actually the same process as closing a TCP connection. There are three types of HTTP connection closing: full closing, half closing, and normal closing. The application can close either the TCP input or output channel, or both at the same time. Calling the socket close() method will close both the input and output channels at the same time, which is called a full close. You can also call the socket shutdown method to close the input or output channel separately, which is called a half-close. The HTTP specification recommends that when the client and server suddenly need to close the connection, they should close it normally, but it does not say how to do it. |
[[436586]] According to GSMA think tank statistic...
Last month we shared the news that Hosteons was g...
On September 6, during HUAWEI CONNECT 2017, Huawe...
Nowadays, wireless WIFI networks are available in...
According to foreign media reports, T-Mobile plan...
In the era of cloud computing, enterprises need a...
[[403061]] This article is reprinted from the WeC...
Most IT organizations are under pressure to be mo...
V5.NET has released a special promotional model, ...
"It is easy to break one arrow, but difficul...
CMIVPS has launched this month's promotion, o...
In the past decade, the field of machine learning...
With the development of the times, people are pur...
The World Cup has entered the semi-finals, with F...