PrefaceWhat happens when we enter a URL into the browser? Have you ever thought about what magical things happen behind it? Today, I will reveal the stories behind the browser layer by layer. This article will start with how to generate HTTP messages, then introduce how DNS servers help us query IP addresses, and finally introduce how the protocol stack ultimately sends the message. The article is very long, so please bear with me. 1. Generate HTTP request message1. Parsing URLsA website address should actually be called a URL. Generally, a website address starts with "http://", but there are also other beginnings, such as "ftp://", "file://" and so on. This part of the text indicates the access method used by the browser. When accessing a Web server, the HTTP protocol is used, and when accessing an FTP server, the FTP protocol is used. In addition to the protocol method that needs to be specified at the beginning, the entire URL also includes the server domain name and the file path name to be accessed, as shown in the following figure: Let's take the HTTP protocol as an example: www.lab.glasscom.com represents the server address to be accessed, and the following path name /dir/file1.html represents accessing the file1.html file in this path of the server. Some of you may have some questions. In daily life, sometimes the website address we visit does not specify the specific file name to be visited, but only a simple domain name. Generally, in this case, most servers will set a default access path, such as index.html or default.htm. This is the first step of the browser's work, parsing the URL. 2. Basic working principle of HTTPBy parsing the URL, we already know the destination of the visit. Next, the browser will access the Web server through the HTTP protocol. The HTTP protocol is a very important knowledge point. I will write a special column to explain it in detail later. Here I will give a brief introduction so that everyone can have a concept. The HTTP protocol defines the message content and steps of the interaction between the client and the server. As shown in the figure above, the client sends a request message to the server. The request can have different operations. HTTP uses methods to represent different operations: After receiving the request, the web server completes its own processing and stores the processing results in the response message. The response message is sent back to the client, and then the client reads the results for display. 3. HTTP request message generationHTTP request messages have certain format requirements, so the browser will generate request messages according to the specified format.
Here I will give you a real example to explain it clearly. For example, we visit www.baidu.com The first line is the request line. From the request line, we can see that it is a GET request, the access path is /, and the protocol version is 1.1. From the first line down are all request headers. Since there is no data to be sent, there is no request body. 4. Receive a response after requesting the messageThe format of the response message is roughly the same as the request message, with the only difference being the first line. The first line of the response message contains the request protocol, status code, and response phrase, which are used to indicate whether the request was successful or an error. 2. How to query IP address1. Basic knowledge of IP addressesAfter generating the HTTP message, we will send the message to your web server through the operating system. Before sending the message through the operating system, there is one important thing to do, which is to query the IP address corresponding to the domain name. The local area networks in the Internet are designed based on TCP/IP. A network is formed by connecting some small subnets with routers to form a large network. In the network, all devices will be assigned an address, just like the place where you live is called "Room xx No. xx". This number is assigned to the entire subnet, and the room is assigned to the computer in the subnet. The whole is called an IP address. The message sent by the sender will first be forwarded to the nearest router through the hub of the subnet, and then the router will send it to the next router according to the destination address. This process will be repeated and finally reach the destination. 2. Why should domain name and IP address be shared?First, let’s think about two questions:
Let's answer the first question first. An IP address is a string of numbers. But think about the actual situation. If you have to enter the IP address every time you visit a website, it will probably be difficult for you to remember it. Using the name is much easier to remember and more recognizable. Now let's talk about the second question. Using the domain name to directly determine the access object and bypass the IP is not feasible from the perspective of actual operating efficiency. The IP address is 4 bytes long, and even the shortest domain name requires dozens of bytes. The longer the bytes, the longer the router will take to process data. The speed of the router has a limit. In the current reality, the performance of the router has almost reached saturation, so direct access is not feasible. Is there any good solution? It is to let people use names and routers use IP addresses. Who will establish the relationship between domain names and IP addresses? This bridge is DNS. 3. How to query IP addressWe can find the IP address through the DNS server. Our computer will have a DNS client to initiate a request to the DNS server. We call it a DNS resolver. The operation of querying the IP address through DNS is domain name resolution. 3. DNS Server Detailed Explanation1. Basic working process of DNS serverIts basic work is to receive query messages from the client and then return response information based on the message content. Generally speaking, the client's query message will contain three parts:
The DNS server will look up the corresponding record in the domain name and IP address comparison table and return it. 2. How to quickly search based on the domain name structureThe current number of domain names is huge, and it is impossible to store them all in one DNS server. Therefore, this information will be distributed and stored in multiple DNS servers. These DNS servers work together to find the final result. Domain names in DNS are separated by periods, for example, www.lab.glasscom.com. According to the company's organizational structure, com represents the group, glass com represents the division, and lab represents the team. The part at one level is called a domain. The information of a domain is stored as a whole in the DNS server, and one server can store information of multiple domains. How do we find out which DNS server manages the information of the server we want to access? First, we can register the IP address of the DNS server responsible for managing the lower-level domain to its upper-level DNS server, and then register the IP address of the upper-level DNS server to the DNS server of the next higher level, and so on. What are the benefits of doing this? If we want to query www.lab.glasscom.com, we can find the DNS server that stores the glasscom.com domain through the DNS server of the com domain, and keep going down, and finally we can find the IP address corresponding to the required domain name. In real life, there is a server that stores the root domain. What is the root domain? It is a domain one level higher than com. It is usually not reflected in the domain name, but it does exist. It manages the information of all subordinate DNS servers. There are only 13 IP addresses of the root domain server in the world. These addresses will not change, so all DNS servers will store these 13 IP addresses. Let's take a look at how to find the target DNS server. The client will first access the nearest DNS server. Because the nearest DNS server does not store the IP address corresponding to the domain name we need, we need to look down from the top level, through the root domain server, until we find the target DNS server, and thus obtain the IP address we need. Generally speaking, if it is the domain name information that we frequently query, the DNS server itself has a cache function and will record the domain names you have queried before. In this way, when the domain name information you request is in the cache, the DNS server will directly return a response, saving the trouble of searching from the root domain every time and reducing the query time. 4. Delegate protocol stack to send messages1. Data sending and receiving processOnce we get the required IP address through the DNS server, we can let the protocol stack inside the operating system send messages to the target IP. Sending and receiving data is done by using the Socket library, as shown below: Before sending or receiving data, both the client and the server must first establish a pipeline. The key to this pipeline is the data entry and exit of the pipeline, which we call a socket. So we need to create a socket first before we can establish a pipe. The server will create a socket first, and the client will also create a socket and then connect to the server. When all the data is sent, the connection to the pipe will be disconnected and the communication operation is over. We can divide this process into 4 stages:
2. Create a socketHow is a socket created? Actually, it is to call the socket component in the Socket library. After the creation, the protocol stack will return a descriptor. The program receives this descriptor and stores it in the memory. This descriptor is used to identify different sockets. Because the browser may have multiple requests, multiple sockets will be created, so there must be a mark to identify. For example, when everyone stays in a hotel, multiple people check in at the same time. In order to ensure that everyone stays in different rooms, each person will be given a room card as a unique identifier, so that the waiter can find the corresponding person based on the room card. 3. How to connect the pipelineAfter the socket is created, we need to connect to the server. Here, the connect component in the Socket library is called to complete it. The call to the connect component requires passing three parameters: descriptor, server IP address, and port number. We all know the first two parameters, but what is the role of the port number? Imagine that the IP address allows us to find the corresponding server, but the server may deploy multiple applications, such as deploying two web services. We cannot identify them simply based on the IP, so we also need to add the port number to find the specific service. Some people may say, don’t we have a descriptor? This is unique? This does not work because the server cannot know this descriptor. 4. End of message transmission and data sending and receivingIt is very simple to pass messages. Just send data into the socket and it will be sent to the other party's socket. This process is also completed through the write program component of the Socket library. When the message is returned, the message is received through the read component in the Socket library. When the server sends the response message, it will actively perform the disconnection operation by calling the close component. When the client receives the data, it will also call close to disconnect. Summary When the browser enters the URL, the browser will first parse the URL, then we will generate an HTTP request message and introduce the basic concepts of the HTTP protocol. Because we access through the domain name, we need to use DNS to get the IP address of the target access object. Finally, we introduced the use of the protocol stack (TCP IP) to actually send the message to the server and complete the data reception. There are many aspects of knowledge involved behind a URL request. Only by knowing the facts and the reasons behind them can we truly learn more valuable knowledge. |
<<: New technology popularization post: What is IPv6+?
>>: The virtual world's "express delivery system" is upgraded again, what is IPv6+?
[[353771]] This article is reprinted from the WeC...
[[348075]] We still have a long way to go before ...
[[405404]] During the Dragon Boat Festival holida...
The latest global 5G network development report f...
[[436945]] Microsoft Teams users can now access a...
Recently, the "GNTC 2020 Global Network Tech...
Traditional data centers cannot always meet the n...
It has been a while since I shared information ab...
In nine days, China's 5G commercialization wi...
[[239400]] Image source: Visual China The CDN pri...
I have talked about service mesh, API gateway and...
5G NR is a complex of contradictions. It is diffi...
Sharktech (also known as Shark Data Center, SK, e...
edgeNAT recently opened a new cabinet in the Core...
[51CTO.com original article] In recent years, whe...