What happens from URL input to page display?

What happens from URL input to page display?

  [[312427]]

Preface

When you open a browser, enter the URL, and then the webpage appears in front of you, what exactly happens behind the scenes? What kind of process does it go through? First, here is the overall flow chart for you. For specific steps, please see the breakdown below!

Generally speaking, it can be divided into the following processes:

  • DNS resolution: resolve domain names into IP addresses
  • TCP connection: TCP three-way handshake
  • Sending HTTP Requests
  • The server processes the request and returns an HTTP message
  • The browser parses and renders the page
  • Disconnection: TCP four times wave

1. What is a URL?

URL (Uniform Resource Locator), a uniform resource locator, is used to locate resources on the Internet, commonly known as the URL. For example, http://www.w3school.com.cn/html/index.asp, which complies with the following grammatical rules:

The explanation of each part of scheme://host.domain:port/path/filename is as follows: scheme - defines the type of Internet service. Common protocols include http, https, ftp, and file. The most common type is http, while https is for encrypted network transmission. host - defines the domain host (the default host for http is www) domain - defines the Internet domain name, such as w3school.com.cn port - defines the port number on the host (the default port number for http is 80) path - defines the path on the server (if omitted, the document must be located in the root directory of the website). filename - defines the name of the document/resource

2. Domain Name Resolution (DNS)

After entering the URL in the browser, it must first go through domain name resolution, because the browser cannot directly find the corresponding server through the domain name, but through the IP address. You may have a question here-a computer can be assigned an IP address, or a host name and domain name. For example, www.hackr.jp. So why not assign an IP address from the beginning? This way, you can save the trouble of resolution. Let's first understand what an IP address is

1. IP address

IP address refers to Internet Protocol address, which is the abbreviation of IP Address. IP address is a unified address format provided by IP protocol. It assigns a logical address to each network and each host on the Internet to mask the difference of physical addresses. IP address is a 32-bit binary number, such as 127.0.0.1 for the local IP. Domain name is equivalent to the disguised IP address, wearing a mask. Its function is to facilitate the memory and communication of a group of server addresses. Users usually use host names or domain names to access each other's computers, rather than directly accessing through IP addresses. Because compared with a group of pure numbers in IP addresses, using letters and numbers to specify computer names is more in line with human memory habits. But it is relatively difficult for computers to understand names. Because computers are better at processing long strings of numbers. In order to solve the above problems, DNS services came into being.

2. What is domain name resolution?

The DNS protocol provides services for finding IP addresses through domain names, or reversely looking up domain names from IP addresses. DNS is a network server, and our domain name resolution is simply to record an information record on the DNS.

For example, baidu.com 220.114.23.56 (server external IP address) 80 (server port number)

3. How does the browser query the IP corresponding to the URL through the domain name?

  • Browser Cache: Browsers cache DNS records at a certain frequency.
  • Operating system cache: If the required DNS record cannot be found in the browser cache, then look for it in the operating system.
  • Routing cache: Routers also have DNS cache.
  • ISP's DNS server: ISP is the abbreviation of Internet Service Provider. ISP has a dedicated DNS server to respond to DNS query requests.
  • Root server: If the ISP's DNS server still cannot find it, it will send a request to the root server for a recursive query (the DNS server first asks the root domain name server for the IP address of the .com domain name server, then asks the .baidu domain name server, and so on)

4. Summary

The browser sends the domain name to the DNS server, and the DNS server queries the IP address corresponding to the domain name and returns it to the browser. The browser then types the IP address in the protocol, and the request parameters are also carried in the protocol and sent to the corresponding server. Next, we will introduce the stage of sending HTTP requests to the server. HTTP requests are divided into three parts: TCP three-way handshake, http request response information, and closing the TCP connection.

3. TCP three-way handshake

Before the client sends data, a TCP three-way handshake is initiated to synchronize the sequence number and confirmation number of the client and server, and to exchange TCP window size information.

1. The process of TCP three-way handshake is as follows:

  • The client sends a packet with SYN=1, Seq=X to the server port (the first handshake, initiated by the browser, telling the server that I want to send a request)
  • The server sends back a response packet with SYN=1, ACK=X+1, Seq=Y to convey confirmation information (the second handshake, initiated by the server, tells the browser that I am ready to accept it, please send it quickly)
  • The client then sends back a data packet with ACK=Y+1, Seq=Z, indicating "handshake is over" (the third handshake, sent by the browser, telling the server, I will send it soon, get ready to accept it)

2. Why is a three-way handshake required?

In "Computer Networks" written by Xie Xiren, it is said that the purpose of the "three-way handshake" is "to prevent an invalid connection request segment from being suddenly transmitted to the server, thereby causing an error."

4. Send HTTP request

After the TCP three-way handshake is completed, the HTTP request message begins to be sent. The request message consists of four parts: request line, request header, and request body, as shown in the following figure:

1. The request line contains the request method, URL, and protocol version

  • There are 8 request methods: GET, POST, PUT, DELETE, PATCH, HEAD, OPTIONS, and TRACE.
  • URL is the request address, which consists of <protocol>://<host>:<port>/<path>?<parameter>
  • Protocol version, i.e., http version number
  1. POST /chapter17/ user .html HTTP/1.1

In the above code, "POST" represents the request method, "/chapter17/user.html" represents the URL, and "HTTP/1.1" represents the protocol and protocol version. The more popular version is Http1.1.

2. The request header contains additional information about the request, which consists of keyword/value pairs, one pair per line, with keywords and values ​​separated by an English colon “:”.

The request header informs the server about the client's request. It contains a lot of useful information about the client environment and the request body. For example: Host, which indicates the host name and virtual host; Connection, which is added in HTTP/1.1 and uses keepalive, that is, a persistent connection, and one connection can send multiple requests; User-Agent, the request sender, compatibility and customization requirements.

3. The request body can carry data of multiple request parameters, including carriage return characters, line feed characters, and request data. Not all requests have request data.

  1. name =tom& password =1234&realName=tomson

The above code carries three request parameters: name, password, and realName.

5. The server processes the request and returns the HTTP message

1. Server

A server is a high-performance computer in a network environment. It listens to service requests submitted by other computers (clients) on the network and provides corresponding services, such as web page services, file download services, mail services, and video services. The main functions of the client are to browse web pages, watch videos, listen to music, etc., which are completely different. An application that processes requests, the web server, is installed on each server. Common web server products include apache, nginx, IIS, or Lighttpd. The web server plays a management and control role. For requests sent by different users, it will combine the configuration files and entrust different requests to the program on the server that handles the corresponding requests for processing (such as CGI scripts, JSP scripts, servlets, ASP scripts, server-side JavaScript, or some other server-side technologies, etc.), and then return the results generated by the background program processing as a response.

2.MVC background processing stage

There are many frameworks for backend development now, but most of them are still built according to the MVC design pattern. MVC is a design pattern that divides an application into three core components: model - view - controller, which each handles its own tasks to achieve the separation of input, processing and output.

1. View

It is the operating interface provided to users and is the shell of the program.

2. Model

The model is mainly responsible for data interaction. Among the three components of MVC, the model has the most processing tasks. A model can provide data for multiple views.

3. Controller

It is responsible for selecting data in the "model layer" according to the instructions input by the user from the "view layer", and then performing corresponding operations on it to produce the final result. The controller belongs to the manager role, receiving requests from the view and deciding which model component to call to process the request, and then determining which view to use to display the data returned by the model processing. These three layers are closely linked, but they are independent of each other. Changes within each layer do not affect other layers. Each layer provides an interface to the outside for the upper layer to call. As for what happens at this stage? In short, the request sent by the browser first passes through the controller, which performs logical processing and request distribution, and then calls the model. At this stage, the model will obtain data from redis db and MySQL, and after obtaining the data, it will render the page. The response information will be returned to the client in the form of a response message, and finally the browser presents the web page to the user through the rendering engine.

3.http response message

The response message consists of three parts: the request line, the response header, and the response body. As shown in the following figure:

(1) The response line contains: protocol version, status code, status code description

The status code rules are as follows: 1xx: Indication - indicates that the request has been received and continues to be processed. 2xx: Success - indicates that the request has been successfully received, understood, and accepted. 3xx: Redirection - further operations must be performed to complete the request. 4xx: Client error - the request has a syntax error or the request cannot be implemented. 5xx: Server error - the server failed to implement a legal request.

(2) The response header contains additional information about the response message, consisting of name/value pairs.

(3) The response body contains carriage return characters, line feed characters, and response return data. Not all response messages contain response data.

6. Browser parsing and rendering pages

After the browser gets the response text HTML, the following is an introduction to the browser rendering mechanism

The browser parses and renders the page in five steps:

  • Parse the DOM tree based on HTML
  • Generate CSS rule tree based on CSS parsing
  • Combine the DOM tree and CSS rule tree to generate a rendering tree
  • Calculate the information of each node according to the rendering tree
  • Draw the page based on the calculated information

1. Parse the DOM tree based on HTML

  • According to the content of HTML, the tags are parsed into a DOM tree according to the structure. The process of DOM tree parsing is a depth-first traversal, that is, all child nodes of the current node are constructed first, and then the next sibling node is constructed.
  • When reading an HTML document and constructing a DOM tree, if a script tag is encountered, the construction of the DOM tree will be paused until the script is executed.

2. Generate CSS rule tree based on CSS parsing

  • When parsing the CSS rule tree, js execution will be paused until the CSS rule tree is ready.
  • The browser will not render until the CSS rule tree is generated.

3. Combine the DOM tree and CSS rule tree to generate a rendering tree

  • After the DOM tree and CSS rule tree are ready, the browser will start building the rendering tree.
  • Simplify CSS and speed up the construction of CSS rule trees, thereby speeding up page response.

4. Calculate the information of each node according to the rendering tree (layout)

  • Layout: Calculate the position and size of each rendering object based on the information of the rendering objects in the rendering tree
  • Reflow: After the layout is completed, if it is found that a certain part has changed and affected the layout, it is necessary to go back and render again.

5. Draw the page based on the calculated information

During the drawing phase, the system traverses the rendering tree and calls the renderer's "paint" method to display the renderer's contents on the screen.

Redraw: The background color, text color, etc. of an element, which does not affect the properties of the surrounding or internal layout of the element, will only cause the browser to redraw.

Reflow: If the size of an element changes, the rendering tree needs to be recalculated and re-rendered.

7. Disconnect

When the data transmission is completed, the TCP connection needs to be disconnected, and TCP waves four times.

  • The initiator sends a message to the passive party, Fin, Ack, Seq, indicating that there is no data transmission. And enter the FINWAIT1 state. (The first wave: initiated by the browser, sent to the server, I have sent the request message, you are ready to close)
  • The passive party sends a message, Ack, Seq, indicating that it agrees to the closing request. At this time, the host initiator enters the FINWAIT2 state. (The second wave: initiated by the server, telling the browser that I have received the request message and I am ready to close. You should also prepare)
  • The passive initiator sends a message segment, Fin, Ack, Seq, to ​​request to close the connection. And enter the LAST_ACK state. (The third wave: initiated by the server, telling the browser that I have sent the response message, you are ready to close it)
  • The initiator sends a message segment, Ack, Seq, to ​​the passive party. Then it enters the TIME_WAIT state. The passive party closes the connection after receiving the message segment from the initiator. If the initiator does not receive a response after a certain period of time, it will close normally. (The fourth wave: initiated by the browser, telling the server that I have received the response message and I am ready to close, and you should also prepare)

<<:  What is Mesh Technology? What are the advantages of mesh networking?

>>:  Who moved my Activity?

Recommend

5G is coming, opening up unlimited business opportunities

Now no one asks "how far are we from 5G"...

Byte side: TCP three-way handshake, very detailed questions!

Hello everyone, I am Xiaolin. A reader was asked ...

Aryaka wins the 2016 Global Most Influential SD-WAN Solution Award

[Original article from 51CTO.com] In the just con...

How to display IP location across the entire network?

In order to further regulate domestic online publ...

Detailed explanation of TCP/IP acceleration principle

Please look at this case first: For a certain key...

As containers become more widely used, how should we monitor them?

With the booming development and implementation o...

What is missing for blockchain to be used commercially on a large scale?

I believe that many people have heard about the e...

How to decide if Wi-Fi 6 is right for you?

There’s a lot of hype surrounding the next Wi-Fi ...