Programmers need to understand CDN, this article should be enough

Programmers need to understand CDN, this article should be enough

I recently learned about edge computing and found that CDN, which we often hear about, is also a part of edge computing. When it comes to CDN, it seems that we only know that it is called content distribution network in Chinese. So what is the specific principle of CDN? What benefits can it bring to users when browsing websites? Solving these two problems is the purpose of this article.

[[313418]]

CDN Concept

The full name of CDN is “Content Delivery Network”, which is called content distribution network in Chinese.

In fact, the concept of CDN was proposed by a research team at the Massachusetts Institute of Technology in 1996 to improve the quality of Internet services. So how does it improve the quality of Internet services?

Principle Analysis

We know that when we use a domain name to access a website, we are actually transmitting the request packet (taking Http request as an example) to a server through the network. For example, when we access "www.baidu.com":

First resolve the IP address corresponding to the domain name (DNS domain name resolution)

Then the Http request packet is routed through the network to the server corresponding to the IP address

We usually say "the IP address of the server", which is actually not very accurate. The IP address is bound to the network card. A server can have multiple network cards, which means it may have multiple IP addresses.

Let's look at the first step: domain name resolution

Domain name resolution

There are two types of domain name resolution:

  1. Resolve a domain name to an IP address
  2. Resolve one domain name to another

In fact, the analysis idea is not difficult. After we purchase a domain name from a domain name service provider, we need to map an IP address. We can use Map to represent this relationship: {domain name: IP}.

At the same time, we can also give a domain name an alias, such as "www.baidu.com" an alias "test.baidu.com", this relationship can also be expressed by Map: {domain name: alias}. The alias here is professionally called CNAME, I believe everyone is a little familiar with this word, it means this.

Domain name resolution actually resolves the IP address corresponding to the specified domain name, or a CNAME of the domain name.

Domain name resolution is handled by the DNS system. The DNS service accepts external requests and extracts the domain name from the request.

  • If the domain name corresponds to an IP address, the IP address is returned.
  • If the domain name corresponds to a CNAME, the IP address of the CNAME domain name is searched and then returned to the request sender.

After the request sender obtains the IP address, the actual request call is completed.

In fact, the DNS system is very large. I will not go into details here. You can think of it as a black box. The function of this box is as described above. Here is a simple diagram to represent it.

Without CNAME:

With CNAME:

Special note: In the case of CNAME, we can find that CNAME actually plays the role of a middleman (or agent) in the process of domain name resolution, which is the key to CDN implementation.

CDN Principle

First of all, CDN is to improve the quality of Internet services. In simple terms, it is to increase access speed.

Assume that Baidu website has only one server now, and someone is visiting Baidu in Shanghai, if the server is also in Shanghai, then the access is usually faster, and if the server is in Lhasa, then the access is relatively slow. The root cause of this problem is that network transmission depends on the network cable. The longer the network cable, the longer the time will be.

So how do we solve this problem? In fact, the idea is very simple. Baidu can just deploy identical servers all over the country. In professional terms, this is called redundancy.

The idea is simple, but the implementation is still quite troublesome. There are two types of resources on the server: static resources and dynamic resources.

  • Static resources: These resources rarely change, such as pictures, videos, CSS, JavaScript, etc.
  • Dynamic resources: This type of resource is usually different when accessed by different users at different times, such as ftl, jsp, etc.

So if Baidu wants to deploy servers all over the country, and if each server has the same dynamic resources, then it may also need to configure the corresponding database, because the information recorded by the dynamic resources is usually stored in the database, so this involves data synchronization and other issues, which will lead to very high costs. This approach is actually a cluster in a more professional way, and at present, the cluster architecture is at most three locations and five centers. It is not that clusters in multiple locations across the country are impossible, but the main reason is that the cost is too high.

If you want to know more about the three places and five centers, you can read this article https://mp.weixin.qq.com/s/uGyGldbwmShDDPDau5pAPw, which is also written by me.

So is there a lower-cost way? Yes, just deploy static resources on each server. Static resources usually do not involve databases, so the cost is relatively low, and it can also improve user access speed.

So far, we have introduced the purpose that CDN wants to achieve. So how to achieve this goal?

Now if we want to compare CDN systems, we can consider two points:

  1. What is the performance and network speed of the server storing static resources in the CDN system?
  2. The number and deployment of server nodes in the CDN system across the country and even the world.

The first point is easy to understand, and everyone should also be able to understand the second point. If there are many server nodes for static resources, each user does not have to "travel a long distance" to obtain these static resources, then this is naturally the advantage of the CDN system.

Some companies have seen this demand, so now there are many CDN providers, such as Alibaba, Tencent, etc., which have their own CDN services. As long as your own system is connected to the CDN services provided by these big companies, you transfer your static resources to the CDN service, and then these static resources will be automatically distributed all over the world.

Okay, so the question now is that when users access static resources, they also access them through domain names. The domain name will be resolved into a certain IP address. The key question is how the DNS system can resolve an IP address closest to the user when performing domain name resolution.

Ordinary DNS systems cannot do this. A special DNS server is required. This special DNS needs to know

  1. User's current location
  2. You also need to know which IP addresses correspond to the domain name that the user is currently accessing, and where are these IP addresses?

The first problem is easy to solve. You can directly extract the user's IP address from the user request. For example, this IP address is resolved to Beijing Telecom, Shanghai Mobile, etc.

The second question is who will solve it. What we are considering now is CDN. The CDN provider definitely knows where their company has deployed machines and their IP addresses, so this problem can only be solved by the CDN provider. The CDN provider will provide this special DNS server, which we call the CDN dedicated DNS server.

In this case, when users use a domain name to access static resources, if they directly configure their computer's DNS address to the CDN dedicated DNS server, then the problem will naturally be solved. However, we need to consider that we cannot require all users in the world to modify the DNS address of their computers. So at this time, we need to use CNAME in DNS.

When a user uses a domain name to access static resources (this domain name is called an "accelerated domain name" in Alibaba CDN services), for example, if the domain name is "image.baidu.com", it corresponds to a CNAME called "cdn.ali.com". When resolving "image.baidu.com", the ordinary DNS server (different from the CDN-specific DNS server) will first resolve it to "cdn.ali.com". The ordinary DNS server will find that the domain name also corresponds to a DNS server, so it will transfer the domain name resolution work to the DNS server, which is the CDN-specific DNS server. The CDN-specific DNS server resolves "cdn.ali.com", and then selects a CDN server address closest to the user based on all CDN server address information recorded on the server, and returns it to the user, so that the user can access the CDN server closest to him.

Replenish:

There are many types of records when resolving domain names, the most commonly used ones are:

  • A record: one domain name corresponds to one IP address
  • CNAME: One domain name corresponds to another domain name
  • NS: Specify other DNS servers to resolve subdomains

Summarize

From the above article, we can find that the implementation principle of CDN depends on DNS. Since I am not specialized in networking, if there are any inaccuracies in the article, please point them out.

<<:  Some Europeans are extremely resistant to 5G, which will only accelerate their elimination

>>:  6 SD-WAN Challenges and Benefits

Recommend

How much do you know about the development of Wi-Fi?

As someone who uses Wi-Fi every day, have you eve...

Where did smart watches lose out?

【51CTO.com Quick Translation】 The failure of smar...

Operations and Continuous Delivery

Operations and Continuous Delivery In the era of ...

New 5G transmission specification helps support demanding 5G applications

The Broadband Forum has published its technical r...