What are the excellent designs worth learning in NS?

What are the excellent designs worth learning in NS?

I used to be a student, and now when I think back, I realize that boys in school had very good memories. They could always remember a complex and mysterious string of alphanumeric domain names, and some experts could even access the Internet directly by typing in the IP address.

Every night when they climb over the school wall to go to the Internet cafe, you can always find them looking for open source learning materials in a forum, and after they are done, they don’t forget to wish the poster a safe life at the bottom of the page.

It turned out that at that time, they were already learning the most important open source and sharing spirit of the Internet.

Whenever I think of it, I am deeply moved.

I was moved.

We will find that there are several technical issues worth discussing here.

For example, why can we access the Internet using both domain name and IP?

What is the relationship between them.

Going deeper, we can talk about the principles of DNS and what is worth learning from its design.

Today, let’s start with why we need DNS.

Why DNS?

If we want to visit a certain website, you can enter the IP address 112.80.248.76 in the search bar on the browser to go directly to the page.

Accessing web pages via IP

This behavior is legal, but sick.

Most people can’t even remember their partner’s phone number, so how can they possibly remember an IP address like this?

Oh, sorry to hurt you brothers, you don’t have a partner.

But I'm assuming you do.

Think about it, even though you can't remember your partner's phone number, it doesn't affect you to call her. Your operation process is to open the address book, enter "rich woman", and then a phone number pops up. Click to call.

In the computer field, you probably can't remember IP addresses, so you also need a similar address book function. For example, you only need to enter www.baidu.com​, and it will help you find the corresponding 112.80.248.76 and then access it.

Access by domain name

Among them, www.baidu.com​ is the domain name, through which we can get the IP behind it is 112.80.248.76.

Just like a person can have multiple phone numbers, a domain name can also correspond to multiple IP addresses.

The process of resolving a domain name into an IP, that is, the process of looking up the "address book", is actually what the DNS (Domain Name System) protocol needs to do.

It should also be noted that the IP address above is accessible when I wrote this article, but it does not mean that you can access it when you read the article. Because the IP address behind it may change. You can get the latest IP address by using ping www.baidu.com.

Ping to obtain IP

But here comes the problem.

An average person’s address book usually has a thousand phone numbers, which means he is considered a social expert, and this is more than enough to fill his address book.

However, the situation is different for website domain names. It is said that they exceeded 300 million in 2015.

If these 300 million records are placed in one server, there will be two problems.

• More than 300 million domain name data, the amount of data is too large, and the amount of data continues to increase

• It needs to bear a large number of read requests. Each website domain may have thousands of visits. This adds up to hundreds of billions of qps.

Obviously, if DNS is made into a single-point service like a mobile phone address book, it is impossible to achieve such capabilities. It must be a distributed system.

So, the question becomes how to design a large-scale distributed system that supports hundreds of billions of qps requests.

I know someone will say: "Is this something that people who only have a 10qps service should consider?"

Although the service we provide may only have 10qps, this does not prevent us from learning the excellent design in DNS.

Let's start with the URL hierarchy.

URL Hierarchy

For example, a common domain name, such as www.baidu.com.

As you can see, there are two periods in the middle of this domain name. The period symbol can be used to divide the domain name into three parts.

Among them, com is called the first-level domain or top-level domain. Other common top-level domains include cn, co, etc. Baidu is the second-level domain, and www is the third-level domain.

In addition, there is actually an omitted period after com. It is called the root domain.

Domain name hierarchy

When there are more and more domain names, their common parts are extracted, and multiple domain names can be turned into a tree-like hierarchical structure like this.

Hierarchy

At this point we can see that there is actually a hierarchical relationship between these domains, just like schools, grades, and classes.

When you want to locate a specific domain name, you can find the corresponding domain name through such a hierarchy.

For example, everyone should still remember the advertising slogan, "Student Li Xiaoming from Class 2, Grade 3, your mother brought two cans of Wangzai milk for you." In fact, Li Xiaoming's mother found people through the hierarchy of schools, grades, and classes.

How DNS Works

Let's take a look at how the big guys designed DNS.

Let me first state the most important conclusion.

  • Use a hierarchical structure to split services
  • Add multi-level cache

Expand next.

Use URL hierarchy to split services

The traffic pressure carried by DNS is very large, so it must be made into a distributed service, so the key to the problem becomes how to split the service.

Since URLs have a tree-like hierarchical structure, the services that store them can also be naturally broken down into a tree-like form based on this.

A server maintains information about one or more domains. The service then becomes a hierarchical form as shown below.

When we need to visit www.baidu.com.

The query process is the same as shown in the figure below.

DNS query process

The request will first be made to the nearest DNS server (such as your home router). If it is not found in the DNS server, the DNS server will directly query the root domain server. Although there is no record of www.baidu.com in the root domain server, it can know that this URL belongs to the com domain, so it finds the IP address of the com domain server, and then accesses the com domain server, repeats the above operation, and then finds which server has the baidu domain, and continues until the record of www.baidu.com is found, and finally returns the corresponding IP address.

As you can see, the principle is relatively simple, but there are two issues involved here.

• How does this machine know what the nearest DNS server IP is?

• How does the nearest DNS server know the IP address of the root domain?

Let’s answer them one by one.

How does this machine know the IP of the nearest DNS server?

This was mentioned in the previous article "How does the computer know its IP address when it is just plugged into the network?" When the network cable is plugged in, the computer will obtain the local IP address, subnet mask, router address, and DNS server IP address through the DHCP protocol.

DHCP protocol

Below is a screenshot of the second phase DHCP Offer packet on my Mac. As you can see, the information returned includes the IP address of the DNS server.

Offer stage

You can also view the IP address of the DNS server in the upper left corner by clicking the Apple icon in the upper left corner -> System Preferences -> Network -> Advanced -> DNS.

There is a small detail here. From the packet capture image above, you can see that the router address, DNS server address, and DHCP server address are all 192.168.31.1. This is actually the IP address of my home router. In other words, most home routers have these functions built-in.

In a certain cloud server, the DNS server is also obtained through the DHCP protocol. It is also very convenient to view the IP address of the DNS server, just execute cat /etc/resolv.conf.

In the nameserver above, you can see that there are two DNS servers. The machine will initiate requests in the order they appear in the file. If the first server does not respond, it will request the second one.

How does the nearest DNS server know the IP address of the root domain?

We also know that the root domain is the top level of the domain name tree. Since it is the top level, the information is generally relatively less. There are only 13 corresponding IPv4 addresses and 25 IPv6 addresses.

We can use the +trace option of the dig command to view the DNS resolution process of a domain name.

The legendary 13 root domains mentioned above, starting with the letters am, are all in the picture above.

But this raises another question: what we see above are all domain names.

this...

"I originally wanted to find the IP address by domain name, but you asked me to find the IP address of another domain name?"

It sounds unscientific. Isn't this a vicious circle?

Yes, so the IPs corresponding to these root domain names will be placed in each domain name server in the form of configuration files.

In other words, there is no need to request the IP corresponding to the root domain name, you can just read it directly in the configuration.

The screenshot below shows the configuration content in the domain name server.

You can see the root domain starting with A, and its IPV4 address is 198.41.0.4.

Add multi-level cache

For high-concurrency scenarios with more reads than writes, adding cache is almost standard.

DNS is no exception. It adds caching, and not just one layer.

Enter the URL in the search box of the browser. It will access the browser cache, the operating system cache /etc/hosts, and the nearest DNS server cache. If none of them can be found, it will query the root domain, top-level (first-level) domain, second-level domain and other DNS servers.

DNS query order after adding cache

So the request process looks like the following figure. You can see that I added a small green file icon to the cached locations mentioned above, and prioritize queries in the cache.

DNS query process after adding cache

Since the tree structure information above is cached, the nearest DNS server no longer needs to start from the root domain every time. For example, if the server IP of baidu.com can be found in the cache, it can directly jump to the secondary domain server to do the search.

Because of the existence of multi-level cache, the number of requests actually received by each layer is greatly reduced. And everyone only visits a few websites on a daily basis, so most of the time the cache can be hit and the IP address can be directly returned.

Let me briefly summarize.

In the design of DNS, services are split through a hierarchical structure and traffic is dispersed to multiple servers.

By adding multi-level cache, the number of requests actually received by each level is greatly reduced, thereby greatly improving the performance of the system.

These two points are excellent designs that we can refer to in the process of business development.

But there is one more thing that we are unlikely to learn, called Anycast. It also provides important support for DNS to achieve high concurrent processing capabilities. I will talk about it in the next article.

Protocol Format

DNS is a domain name resolution system, and the protocol running on this system is called DNS protocol.

Similar to HTTP, DNS protocol is also an application layer protocol.

DNS is an application layer protocol

The figure below shows its message format.

DNS Messages

Too many fields? That's right.

Let’s just talk about a few key points.

Transaction ID is the transaction ID. For a request and the corresponding response to this request, their transaction IDs are the same, similar to log_id in a microservice system.

The flag field refers to the flag bit, which has 2 bytes and 16 bits. The ones that need attention are QR, OpCode, and RCode.

•QR is used to indicate whether this is a query or response message, 0 is query and 1 is response.

•OpCode is used to mark the operation code. Normal queries are all 0, whether it is a domain name to search for an IP, or an IP to search for a domain name. It can be roughly assumed that we usually only see 0.

•RCode is the response code, similar to the status code 404, 502​ in HTTP. It is used to indicate whether the result of this request is normal. 0 means everything is normal. 1 means the message format is wrong, and 2 means an internal error in the service domain name server.

The Queries field refers to the actual query content. It actually contains three parts of information: Name​, Type​, and Class.

The query content is divided into three parts of information

• Name can contain domain name or IP. For example, if you want to check the IP corresponding to the domain name baidu.com​, the domain name is placed in it. If you want to check the corresponding domain name through IP, the IP is placed in the Name field.

•Type refers to the type of information you want to check. For example, if you want to check the IP address corresponding to this domain name, fill in A (address). If you want to check whether this domain name has other aliases, fill in CNAME (Canonical Name). If you want to check the email server address corresponding to [email protected] (such as gmail.com), fill in MX (Mail Exchanger). In addition to this, there are many other types. The following is a common Type table.

• The Class field is quite interesting. You can simply assume that we will only see it filled with IN​ (Internet​). In fact, the DNS protocol was originally designed to take into account more application scenarios, such as CH and HS. You don’t even need to know what they mean, because with the development of time, these have become fossils. The only function of this field we know is that it may be used to pretend to be something during an interview and hide your achievements and fame.

The Answers field, as the name suggests, corresponds to Queries, one question and one answer. Its function is to return the query results. For example, if you look up the corresponding IP address through the domain name, this field will put the specific IP information.

Packet capture

Now that we have read the principle, let’s grab a package.

We open wireshark. Then execute

 dig www .baidu .com

At this time, the operating system will send a DNS request to query the IP address corresponding to www.baidu.com.

DNS_Query

The above picture shows the content of the DNS query (request​). You can see that it is an application layer protocol, and the transport layer uses the UDP protocol for data transmission. The red part in the screenshot is the message field content mentioned above that needs to be paid attention to. The flag​ field is displayed by bit, so it is displayed in separate lines in the captured packet.

Next, let’s look at the contents of the response data packet.

DNS_Response

It can be seen that the transaction ID is consistent with the DNS request message. And the Answers field contains two IP addresses. After trying, both IP addresses can be accessed normally.

Summarize

• DNS is an excellent high-concurrency distributed system. It splits services through a hierarchical structure and distributes traffic to multiple servers. By adding multi-level cache, the actual cache received by each level is greatly reduced, thus greatly improving the performance of the system. These two points can be used as reference in the process of business development.

• When the network cable is plugged in, the machine obtains the address of the DNS server through the DHCP protocol.

• The IP address of the root domain server will be loaded into each DNS server in the form of configuration. Therefore, you can easily find the IP address corresponding to the root domain by accessing any DNS server.

at last

Finally, I leave you with two questions.

DNS is based on UDP protocol

• From the packet capture, we can see that DNS uses the UDP protocol at the transport layer. Does it only use UDP?

• As mentioned above, there are only 13 IPv4 root domain names of DNS, and many of them are actually deployed in the United States. Does that mean that as long as they are unhappy and cut off our access, our network will be paralyzed?

I have been away from Guangdong for a long time, and no one has called me handsome for a long time.

Can you guys call me a handsome guy in the comment section?

Recently, more and more brothers have called me diaomao in the comment section.

So emo. There is nothing diaoxiao about it. The person in front of you is just a poor worker who is wandering outside and missing his hometown.

so.

Can such a kind and simple wish of mine be fulfilled?

<<:  A two-way communication without IP confirmation via Udp

>>:  Dedicated 5G networks for smart sports stadiums and venues

Recommend

The Internet of Things in the 5G Era

The government envisions making India a $5 trilli...

Why are operators trying so hard to promote 5G packages?

[[426961]] In July last year, an article on Xinhu...

Wireless AP Capacity and Network Bandwidth Calculation Method

Wireless AP is the access point for users to ente...

Telling the story of HTTPS

Starring in the story: Xiaohua is a freshman this...

Global Power over Ethernet Lighting Solutions

The world of smart buildings is undergoing a majo...

Comparison between MQTT and SSE

Building a real-time web or mobile application is...

One cannot miss the key points of future data center development

Where will the data center of the future go? I be...

VirMach: $27.3/month-E3-1240v1/16GB/1TB/10TB/Los Angeles and other data centers

Last time, I shared the End of Life Plans series ...

What is Wi-Fi 7?

The Wireless Broadband Alliance (WBA) announced i...

SAP HANA Express Edition for Developers Launched on Huawei Cloud

On September 5, during HUAWEI CONNECT 2017, Huawe...