A CDN is a group of geographically distributed proxy servers. A proxy server is an intermediary server between the client and the origin server. These proxy servers are located at the edge of the network, close to the end-user. The placement of proxy servers helps in fast delivery of content to the end-user by reducing latency and saving bandwidth. CDN also has additional intelligence for optimizing traffic routing and implementing rules to protect against DDoS attacks or other abnormal network events. Essentially, CDN solves two problems:
Let's start with the high-level component design and main workflow. High-level architecture
Let's go through the main workflow:
Now imagine that you have a single website and need to distribute content to 20 regions, each with 20 agents that need to be stored. 20 regions + 20 replicas, which means you need to transfer data to the CDN 400 times, which is very inefficient. To solve this problem, you can use a tree-like replication model. Tree-like content distributionData is sent to a regional edge proxy server, which is then replicated to a child proxy server in the same region using the CDN's internal network. This way, we only need to replicate content once per region or geographic area. Depending on the scale, a region can be a specific data center or a larger geographic area, where we have two levels of parent proxy servers. It is crucial for users to get data from the nearest proxy server, because the goal of a CDN is to reduce latency by bringing data close to the user. There are two routing models that CDN companies usually use. The first one is based on DNS with load balancing, which has been the most popular historically. I think the newer and more effective one is the Anycast model, which delegates routing and load balancing to the Internet's BGP protocol. Let's take a look at them. In a typical DNS resolution, we use the DNS system to get the IP corresponding to a readable name. In our case, we will use DNS to return another DNS name to the client. This is called DNS redirection, and content providers use it to send clients to a specific CDN region. For example, if a customer tries to resolve company.com, the authoritative DNS server provides another URL (such as cdn.us-east.company.com). The client performs another DNS resolution and obtains the IP address of a suitable CDN proxy server in the US-East region. Depending on the location of the user, the DNS response will be different. So first the client is mapped to the appropriate datacenter based on the user's location. In the second step, it calls one of the load balancers to distribute the load on the proxy servers. To move a client from one region to another, DNS changes must be made to remove the load balancer IP that is in a difficult region. For this to work, the DNS TTL must be set to the lowest possible so that the client picks up the changes as quickly as possible. But there will still be some traffic going through, and if that zone goes down, traffic will be impacted. I discuss similar issues in another video about scalable API gateways and edge design. I'll put a link to the video in the description. A more effective approach is the Anycast design. Anycast is a routing method in which all edge servers located in multiple locations share the same single IP address. It utilizes the Border Gateway Protocol or BGP to route clients based on the natural network flow of the Internet. CDNs use the Anycast routing model to deliver Internet traffic to the nearest data center to ensure improved response times and prevent any data center from being overloaded with traffic in the event of special needs, such as DDoS attacks. When a request is sent to an Anycast IP address, the router will direct it to the nearest machine on the network. If an entire data center fails or undergoes maintenance, the Anycast network can respond to the failure similar to how a load balancer splits traffic across multiple servers or regions; data will be transferred from the failed location to another data center that is still online and functioning properly. Anycast ReliabilityUnicast IP using DNS and load balancer uses single machine, single IP. Most of the Internet works via the Unicast routing model, where each node on the network is given a unique IP address. Anycast is - many machines, one IPWhile Unicast is the simplest way to run a network, it is not the only way. Using Anycast means the network can be very resilient. Because traffic will find the best path, we can take entire data centers offline and traffic will automatically flow to the next closest data center. A final benefit of Anycast is that it can also help mitigate DDoS attacks. In most DDoS attacks, many compromised "zombie" computers are used to form what is known as a botnet. These machines can be spread out across the network and generate so much traffic that they can overwhelm a typical Unicast-connected machine. The nature of an Anycasted network is that it inherently increases the surface area for absorbing such attacks. A distributed botnet absorbs a portion of its denial-of-service traffic into every data center that has capacity. A real world example is Cloudflare, which has built a global proxy network with hundreds of locations around the world. It claims to put them within about 50 milliseconds of 95% of the world's internet-connected population. Since the network is also built on Anycast IP, it offers a total capacity of over 170 Tbps. This means that not only can they serve a large number of customers, they can also handle the largest DDoS attacks by spreading malicious traffic across multiple locations. |
<<: K8s-Service Mesh Practice-Introduction to Istio
>>: Ping command advanced usage
The progress of 5G has always been a key topic of...
Our company has always had the need to connect al...
[51CTO.com original article] In the digital age, ...
A computer network is a system of interconnected ...
[[397604]] This experiment uses the SIM7600CE 4G ...
HostYun recently launched the AMD5950X+M.2 SSD pr...
Since the advent of the 5G era, the most mentione...
In addition to the rapid development and wide cov...
Hosteons released the OpenVZ 7 VPS Migration to K...
[October 13, Yangquan, Shanxi] On October 13, the...
With the freezing of the R3 core standard of NB-I...
[[339299]] This article is reprinted from the WeC...
According to the established plan, the three majo...
Megalayer's regular VPS half-price promotion ...
Recently, the three major operators have successi...