Understand the HTTP request process in one article. If you don't believe it, you still don't know it

Understand the HTTP request process in one article. If you don't believe it, you still don't know it

 

Prerequisites

OSI architecture TCP/IP related protocol structure Application layer HTTP, Telnet, FTP, etc. Presentation layer Session layer Transport layer TCP, UDP Network layer IP Data link layer Physical layer

We know that the HTTP protocol is built on the basis of TCP connection. HTTP is a protocol that allows browsers to obtain resources from servers. It is the basis of the Web. It is usually initiated by browsers to obtain different types of files, such as HTML files, CSS files, JavaScript files, pictures, videos, etc. In addition, HTTP is also the most widely used protocol by browsers.

If we don't know much about HTTP, we may have such doubts, such as why visiting the same site again is faster than the first time, and why the website is in the logged-in state when visited again after logging in once. We can solve these mysteries by analyzing the HTTP request process.

The browser initiates the HTTP request process

Enter the URL in the browser: http://time.geekbang.org/index.html. What steps will be completed afterwards?

1. Build a request

First, the browser builds the request line information. After that, the browser is ready to initiate the network request.

  1. GET /index.html HTTP1.1

2. Find the cache

Before actually initiating a network request, the browser will first query the browser cache to see if there is the file to be requested. Browser cache is a technology that saves a copy of the resource locally for direct use the next time the request is made.

When the browser finds that the requested resource already exists in the browser cache, it will intercept the request and return the resource copy to end the request. If the cache search fails, it will enter the network request. So it will be beneficial to:

  • Relieve server-side pressure and improve performance
  • For websites, caching is an important part of achieving fast resource loading, reducing the time to obtain resources.

3. Prepare IP address and port

We have roughly understood the relationship between HTTP and TCP through the preliminary knowledge at the beginning and the previous text. The browser uses HTTP protocol as the application layer protocol to encapsulate the text information of the request; and uses TCP/IP as the transport layer protocol to send it to the network, so before HTTP starts working, the browser needs to establish a connection with the server through TCP. In other words, the content of HTTP is realized through the data transmission stage of TCP.

Schematic diagram of the relationship between TCP and HTTP:

Based on this, we can know that establishing an HTTP network request is to resolve the URL address to obtain IP and port information, and establish a server and TCP connection. We have mentioned in the previous article "TCP Protocol" that data packets are transmitted to the recipient through IP addresses. The general address of our website is the domain name, so it is necessary to map the domain name and IP address, that is, the system "Domain Name System (DNS)" that resolves the IP address resolves the IP address and obtains the corresponding port number to obtain the prerequisite for establishing a connection. In other words, the browser requests DNS to return the IP corresponding to the domain name, and when requesting DNS, it will also query the DNS data cache service to determine whether the domain name has been resolved. If it has been resolved, the query is used directly. After obtaining the IP, it is determined whether the URL specifies the port number. If not, the HTTP protocol defaults to port 80.

4. Waiting for TCP queue

Chrome has a mechanism that only 6 TCP connections can be established at the same time for the same domain name. If there are 10 requests at the same time under the same domain name, 4 of them will enter the queue waiting state until the ongoing requests are completed. Of course, if the current number of requests is less than 6, it will directly proceed to the next step to establish a TCP connection.

5. Establish a TCP connection

After the queue waiting ends, TCP and the server implement a "three-way handshake" (described in the previous TCP protocol), that is, the client and server send three data packets to confirm the connection, thus realizing the connection between the browser and the service.

6. Send HTTP request

Once the TCP connection is established, the browser can communicate with the server. The data in HTTP is transmitted during this communication process.

HTTP request data format:

First, the browser sends a request line to the server, which includes the request method, request URI (Uniform Resource Identifier) ​​and HTTP version protocol.

The request methods include GET, POST, PUT, Delete, etc. The commonly used POST is used to send some data to the server, such as logging into a website and sending user information to the server. Generally, this data is sent through the request body.

After the browser sends the request line command, it also sends some other information in the form of a request header to tell the server some basic information about the browser, such as the operating system used by the browser, the browser kernel, the domain name information of the current request, and cookies.

Server-side HTTP request processing process

1. Return request

  1. curl -i https:// time .geekbang.org/

Through the curl tool (or network panel), we can understand the data format returned by the server:

First the server returns a response line, including the protocol version and status code.

If an error occurs, the server returns the corresponding processing result through the status code of the request line, for example:

  • The most commonly used status code is 200, indicating successful processing;
  • 404, meaning the page was not found
  • 500, indicating server error

Just as the browser sends a request header along with the request, the server also sends a response header to the browser along with the response. The response header contains some information about the server itself, such as the time when the server generated the return data, the type of data returned (JSON, HTML, streaming media, etc.), and the cookies that the server wants to save on the client.

After the response header, the server will send the response body data, which usually contains the actual content of the HTML. The above is the process of the server responding to the browser.

2. Disconnect

Once the server returns the request data to the client, it closes the TCP connection. However, if the browser or server adds the following to its header information:

  1. Connection :Keep-Alive

The TCP connection will remain open after sending, so that the browser can continue to send requests through the same TCP connection. Maintaining a TCP connection can save the time needed to establish a connection for the next request and increase resource loading speed. If the images embedded in a page are all from the same web site, initializing a persistent connection can reuse and reduce TCP connections.

3. Redirection

Redirect returns the response line and response headers:

Status 301 tells the browser that I need to redirect to another URL, and the URL that needs to be redirected is contained in the Location field of the response header. Next, the browser obtains the address in the Location field and uses the address to navigate again. This is the execution process of a complete redirection.

Summarize

Through the complete process of http request, we know that during the request process, DNS cache and page resource cache will be cached by the browser to reduce the resources requested from the server, so the speed will be faster when requesting the site again.

Browser resource cache processing process:

As can be seen from the first request in the figure above, when the server returns the HTTP response header to the browser, the browser uses the Cache-Control field in the response header to set whether to cache the resource. Usually, we also need to set a cache expiration time for this resource, and this time is set through the Max-age parameter in Cache-Control.

Therefore, if the cached resource has not expired, if the resource is requested again, the resource in the cache will be directly returned to the browser.

If the cache expires, the browser will continue to initiate a network request and include If-None-Match in the HTTP request header. After receiving the request header, the server will determine whether the requested resource has been updated based on the value of If-None-Match.

  • If there is no update, a 304 status code is returned, which is equivalent to the server telling the browser that the cache can continue to be used.
  • If the resource is updated, the server will directly return the latest resource to the browser.

Log in to the website and submit information to the server via POST. After the server receives the information submitted by the browser, it will query and verify that the information is correct. It will generate a string indicating the user's identity and write it into the Set-Cookie field of the response header and return it to the browser.

The browser parses the response header and saves it locally if there is a Set-Cookie field. When the user visits again, the browser reads the cookie data and writes it into the request header and sends it to the server before initiating an HTTP request. The server judges the information again and displays the user login status and user information if it is correct.

Finally, it is concluded that the HTTP request in the browser goes through eight stages from initiation to completion: building a request, searching the cache, preparing the IP and port, waiting for the TCP queue, establishing a TCP connection, initiating an HTTP request, the server processing the request, the server returning the request, and disconnecting.

Detailed HTTP request process:

<<:  Aruba CX Next-Generation Switching Platform Launches to Provide New Network Experience from Edge to Cloud

>>:  How much does it cost to build a 5G base station?

Blog    

Recommend

5G and satellite, what is the relationship?

[[353771]] This article is reprinted from the WeC...

Eight excellent open source intranet penetration tools

Intranet penetration (NAT penetration) is a techn...

You need to be prepared for the coming 6G wireless technology

[51CTO.com Quick Translation] A research team is ...

Let's talk about 11 main neural network structures

With the rapid development of deep learning, a wh...

FCC votes to approve new round of 5G auction: once reserved for the US military

The Federal Communications Commission (FCC) voted...

How to tell if Wi-Fi 6 is right for you

There is a lot of discussion around the next gene...

What other uses does a wireless router have besides WiFi access?

Wireless routers have entered thousands of househ...

PAM4 and Coherent Technology in 100G DWDM Optical Modules

[[385177]] 100G transmission in data centers is p...