An article to understand the principles of CDN technology

An article to understand the principles of CDN technology

Overview

The rapid development of the Internet has brought great convenience to people's work and life, and the requirements for Internet service quality and access speed are getting higher and higher. Although the bandwidth is constantly increasing and the number of users is also increasing, the slow response speed is still often complained and troubled by factors such as the load of the Web server and the transmission distance.

[[247009]]

The solution is to use caching technology in network transmission to enable Web service data streams to be accessed locally. This is a very effective technology for optimizing network data transmission, thereby achieving a high-speed experience and quality assurance.

The purpose of network caching technology is to reduce the repeated transmission of redundant data in the network, minimize it, and convert wide-area transmission to local or nearby access. Most of the content transmitted on the Internet is repeated Web/FTP data. Cache servers and network devices using caching technology can greatly optimize data link performance and eliminate node device congestion caused by data peak access.

Cache server has caching function, so most web page objects, such as html, htm, PHP and other page files, gif, tif, png, bmp and other image files, as well as files in other formats, do not need to be re-transmitted from the original website for repeated visits within the validity period (TTL). Instead, the local copy can be directly transmitted to the visitor through simple authentication (Freshness Validation) - sending a header of dozens of bytes.

Since cache servers are usually deployed close to the user end, they can achieve a response speed close to that of a local area network and effectively reduce the consumption of wide area bandwidth. According to statistics, more than 80% of Internet users repeatedly access 20% of information resources, which provides a prerequisite for the application of cache technology.

The architecture of the cache server is different from that of the Web server. The cache server can achieve higher performance than the Web server. The cache server can not only improve the response speed and save bandwidth, but is also very effective in accelerating the Web server and effectively reducing the load on the source server.

Cache Server is a professional functional server with highly integrated software and hardware. It mainly provides cache acceleration services and is generally deployed at the edge of the network. According to the acceleration object, it is divided into client acceleration and server acceleration. Client acceleration Cache is deployed at the network exit to cache frequently accessed content locally to improve response speed and save bandwidth; server acceleration, Cache is deployed at the front end of the server as a front-end machine of the Web server to improve the performance of the Web server and accelerate access speed. If there are multiple Cache acceleration servers distributed in different regions, it is necessary to manage the Cache network through an effective mechanism to guide users to access nearby and globally load balance traffic. This is the basic idea of ​​the CDN content delivery network.

What is a CDN content delivery network?

The full name of CDN is Content Delivery Network. Its purpose is to add a new network architecture to the existing Internet to publish the content of the website to the "edge" of the network closest to the user, so that the user can get the required content nearby, solve the Internet network congestion, and improve the response speed of users visiting the website. It is a comprehensive technical solution to the root cause of the slow response speed of users visiting the website due to small network bandwidth, large user visits, uneven distribution of network points, etc.

In a narrow sense, a content distribution network (CDN) is a new type of network construction method. It is a network coverage layer specially optimized for publishing broadband rich media on traditional IP networks. From a broad perspective, CDN represents a network service model based on quality and order.

Simply put, a content distribution network (CDN) is a strategically deployed overall system that includes distributed storage, load balancing, network request redirection, and content management. Content management and global network traffic management (Traffic Management) are the core of CDN. By judging the proximity of users and server load, CDN ensures that content serves user requests in an extremely efficient manner.

In general, content services are based on cache servers, also known as proxy caches (Surrogate), which are located at the edge of the network, only one hop away from the user. At the same time, the proxy cache is a transparent mirror of the content provider's source server (usually located in the CDN service provider's data center). This architecture enables CDN service providers to provide the best possible experience to end users on behalf of their customers, i.e. content providers, who cannot tolerate any delay in request response time.

According to statistics, the use of CDN technology can handle 70% to 95% of the content visits of the entire website page, reduce the pressure on the server, and improve the performance and scalability of the website.

Compared with the existing content publishing model, CDN emphasizes the importance of the network in content publishing. By introducing active content management layer and global load balancing, CDN is fundamentally different from the traditional content publishing model. In the traditional content publishing model, content publishing is completed by the ICP application server, and the network only acts as a transparent data transmission channel. This transparency is reflected in the fact that the quality assurance of the network only stays at the data packet level, and cannot distinguish the service quality according to different content objects.

In addition, due to the "best effort" nature of IP networks, quality assurance relies on providing sufficient bandwidth throughput between users and application servers, which is far greater than the actual required bandwidth. In such a content publishing model, not only a large amount of valuable backbone bandwidth is occupied, but the load on ICP application servers also becomes very heavy and unpredictable.

When some hot events or traffic surges occur, local hot spot effects will occur, causing the application server to be overloaded and out of service. Another drawback of this central application server-based content publishing model is the lack of personalized services and the distortion of the broadband service value chain. Content providers are responsible for content publishing services that they should not do and cannot do well.

Looking at the entire value chain of broadband services, content providers and users are located at the two ends of the entire value chain, and network service providers connect them in the middle. With the maturity of the Internet industry and the transformation of business models, the roles in this value chain are becoming more and more numerous and more and more segmented.

For example, content/application operators, hosting service providers, backbone network service providers, access service providers, etc. Each role in this value chain must cooperate and perform their respective duties to provide customers with good services, thus bringing about a win-win situation. From the perspective of the combination of content and network, content publishing has gone through the two stages of ICP content (application) server and IDC. The IDC boom has also given rise to the role of hosting service providers. However, IDC cannot solve the problem of effective content publishing. Content located in the center of the network cannot solve the occupation of backbone bandwidth and establish traffic order on the IP network. Therefore, it has become an obvious choice to push content to the edge of the network and provide users with nearby edge services to ensure the quality of service and the access order on the entire network. This is the content distribution network (CDN) service model. The establishment of CDN solves the dilemma of "centralization and decentralization" of content that has troubled content operators. It is undoubtedly valuable and indispensable for building a good Internet value chain.

New CDN Applications and Customers

At present, CDN services are mainly used in the fields of securities, finance and insurance, ISP, ICP, online transactions, portals, media sites, large and medium-sized companies, online teaching, etc. In addition, it can be used in industry-specific networks and the Internet, and can even be used to optimize the local area network. With CDN, these websites do not need to invest in expensive servers of various types and set up sub-sites. Especially for the widespread use of streaming media information, remote teaching courseware and other media information that consumes a lot of bandwidth resources, the CDN network can be used to copy the content to the edge of the network, so that the distance between the content request point and the delivery point is minimized, thereby promoting the improvement of Web site performance, which is of great significance.

The construction of CDN networks mainly includes CDN networks built by enterprises to serve enterprises; IDC CDN networks, which mainly serve IDC and value-added services; CDN networks built by network operators, which mainly provide content push services; CDN network service providers, which specially build CDNs for services. Users cooperate with CDN organizations, and CDN is responsible for information transmission, ensuring the normal transmission of information and maintaining the transmission network. Websites only need to maintain content and no longer need to consider traffic issues.

CDN can ensure the network's speed, security, stability, and scalability.

IDC establishes CDN network. IDC operators generally need to have multiple IDC centers in different regions. The service objects are customers hosted in IDC centers. It uses existing network resources, requires less investment and is easy to build. For example, an IDC has 10 computer rooms across the country. If they join the IDC CDN network and host a Web server in one node, it is equivalent to having 10 mirror servers for customers to access nearby.

In a broadband metropolitan area network, the network speed within the domain is very fast, but the bandwidth out of the city is generally a bottleneck. In order to reflect the high-speed experience of the metropolitan area network, the solution is to cache the Internet content locally and deploy the cache at various POP points in the metropolitan area network. This will form an efficient and orderly network, and users can access most of the content with just one hop. This is also a way to accelerate the application of CDN for all websites.

How CDN works

Before describing the implementation principle of CDN, let's first look at the traditional access process of non-cached services to understand the difference between CDN cache access and non-cached access:

As can be seen from the figure above, the process of a user accessing a website that does not use CDN caching is as follows:

  • The user provides the browser with the domain name to access;
  • The browser calls the domain name resolution function library to resolve the domain name to obtain the IP address corresponding to the domain name;
  • The browser uses the obtained IP address and domain name service host to make a data access request;
  • The browser displays the content of the web page based on the data returned by the domain name host.

Through the above four steps, the browser completes the entire process from receiving the domain name that the user wants to visit to obtaining data from the domain name service host. The CDN network adds a cache layer between the user and the server. How to guide the user's request to the cache to obtain the data of the source server is mainly achieved by taking over the DNS. Let's take a look at the process of accessing a website after using the CDN cache:

From the above figure, we can see that the access process of the website after using CDN cache becomes:

  • The user provides the browser with the domain name to access;
  • The browser calls the domain name resolution library to resolve the domain name. Since CDN has adjusted the domain name resolution process, the resolution function library generally obtains the CNAME record corresponding to the domain name. In order to obtain the actual IP address, the browser needs to resolve the obtained CNAME domain name again to obtain the actual IP address. In this process, the global load balancing DNS resolution is used, such as resolving the corresponding IP address according to the geographic location information, so that users can access nearby.
  • This resolution obtains the IP address of the CDN cache server. After obtaining the actual IP address, the browser sends an access request to the cache server.
  • The cache server obtains the actual IP address of the domain name provided by the browser through the dedicated DNS resolution inside the cache, and then submits the access request to the actual IP address.
  • After the cache server obtains the content from the actual IP address, it saves it locally for later use and returns the obtained data to the client to complete the data service process;
  • After the client receives the data returned by the cache server, it displays it and completes the entire browsing data request process.

Through the above analysis, we can conclude that in order to achieve transparency for ordinary users (that is, after adding the cache, the user client does not need to make any settings and can directly use the original domain name of the accelerated website to access it), and to reduce the impact on ICP while providing acceleration services for designated websites, we only need to modify the domain name resolution part of the entire access process to achieve transparent acceleration services. The following is the specific operation process of the CDN network implementation.

  • As an ICP, you only need to hand over the right to interpret the domain name to the CDN operator, and no other changes are required. During operation, the ICP modifies the resolution record of its own domain name, generally using the cname method to point to the address of the CDN network Cache server.
  • As a CDN operator, you first need to provide public resolution for the ICP domain name. To implement the sortlist, you usually point the ICP domain name resolution result to a CNAME record.
  • When a sorlist is needed, the CDN operator can use DNS to perform special processing on the domain name resolution process pointed to by CNAME, so that when the DNS server receives a client request, it can return different IP addresses for the same domain name based on the client's IP address;
  • Since the IP address obtained from cname contains hostname information, after the request reaches the cache, the cache must know the IP address of the origin server, so an internal DNS server is maintained within the CDN operator to interpret the real IP address of the domain name accessed by the user;
  • When maintaining an internal DNS server, you also need to maintain an authoritative server to control which domain names can be cached and which cannot be cached to avoid open proxy situations.

Technical means of CDN

The main technical means to achieve CDN are high-speed cache and mirror server. It can work in two ways: DNS resolution or HTTP redirection, and complete the transmission and synchronous update of content through cache server or remote mirror site.

The accuracy of user location judgment using the DNS method is greater than 85%, and the accuracy of the HTTP method is more than 99%; in general, the ratio of the amount of data flowing into the cache server group from users to the amount of data that the cache server retrieves from the original website is between 2:1 and 3:1, which means that 50% to 70% of the data that is repeatedly accessed to the original website is shared (mainly pictures, streaming media files, etc.); for mirroring, except for the traffic for data synchronization, the rest is completed locally without accessing the original server.

Mirror Site servers are often seen. They allow content to be distributed directly and are suitable for static and quasi-dynamic data synchronization. However, the cost of purchasing and maintaining new servers is high. In addition, mirror servers must be set up in various regions and equipped with professional technicians for management and maintenance. Large websites will significantly increase their bandwidth requirements while updating servers in various regions at any time. Therefore, general Internet companies will not set up too many mirror servers.

The cost of high-speed caching is low and it is suitable for static content. Internet statistics show that more than 80% of users frequently visit the content of 20% of websites. Under this rule, the cache server can handle most of the static requests of customers, while the original WWW server only needs to handle about 20% of non-cached requests and dynamic requests, thus greatly speeding up the response time of customer requests and reducing the load of the original WWW server. According to a survey by IDC, an important indicator of CDN, the cache market is growing at a rate of nearly 100% per year, and the global turnover will reach 4.5 billion US dollars in 2004. The development of network streaming media will also stimulate the demand of this market.

CDN network architecture

The CDN network architecture mainly consists of two parts: the center and the edge. The center refers to the CDN network management center and the DNS redirection resolution center, which are responsible for global load balancing. The equipment system is installed in the computer room of the management center. The edge mainly refers to the remote nodes, the carrier of CDN distribution, and is mainly composed of cache and load balancer.

When a user visits a website that has joined the CDN service, the domain name resolution request will eventually be handed over to the global load balancing DNS for processing. The global load balancing DNS provides the user with the node address closest to the user at the time through a set of pre-defined strategies, so that the user can get fast service. At the same time, it also maintains communication with all CDNC nodes distributed around the world, collects the communication status of each node, and ensures that the user's request is not assigned to an unavailable CDN node. In fact, it is global load balancing through DNS.

For ordinary Internet users, each CDN node is equivalent to a web server placed around it. Through the control of global load balancing DNS, the user's request is transparently directed to the node closest to him. The CDN server in the node will respond to the user's request just like the original server of the website. Since it is closer to the user, the response time is bound to be faster.

Each CDN node consists of two parts: load balancing device and cache server

The load balancing device is responsible for the load balancing of each cache in each node to ensure the working efficiency of the node; at the same time, the load balancing device is also responsible for collecting information about the node and the surrounding environment, maintaining communication with the global load DNS, and achieving load balancing of the entire system.

A cache server is responsible for storing a large amount of information on a client's website and responds to local users' access requests just like a website server close to the user.

The CDN management system is the guarantee for the normal operation of the entire system. It can not only monitor the various subsystems and devices in the system in real time and generate corresponding alarms for various faults, but also monitor the total flow and flow of each node in the system in real time, and save them in the system database, so that network management personnel can easily conduct further analysis. Through the perfect network management system, users can modify the system configuration.

In theory, the simplest CDN network can run with a DNS responsible for global load balancing and a cache on each node. DNS supports resolving different IP addresses based on the user's source IP address to achieve nearby access. In order to ensure high availability, it is necessary to monitor the traffic and health status of each node. When a single cache on a node does not carry enough traffic, multiple caches are needed. When multiple caches work at the same time, a load balancer is needed to enable the cache group to work together.

<<:  Listen to the strongest voice of open source in China | Open source projects from major domestic companies gathered at OSCAR Open Source Pioneer Day

>>:  SD-WAN and Operations

Blog    
Blog    

Recommend

Report: Global Satellite IoT Market Users to Reach 26.7 Million in 2028

According to a recent research report released by...

Dewu App intercepts WiFi at 10,000 meters

0. Summary of the previous situation During a fli...

CAICT answers hot issues on “number portability” service

On November 27, 2019, China Telecom, China Mobile...

Ruijie Cloud Desktop has emerged as a new force. Why do users prefer it?

[51CTO.com original article] Speaking of players ...

An Internet cable makes the whole dormitory building quieter after lights out

He tells the story of how he used an internet cab...

CloudCone March Event: Los Angeles SSD VPS starting at $1.65 per month

CloudCone's Hashtag 2022 VPS Sale this month ...

How have the three major operators been doing in the past nine months?

[[248346]] With China Telecom announcing its oper...