For developers, the term CDN is both familiar and unfamiliar. I rarely need to touch this when I am doing development, but I always hear others mention it. We've all heard that it can speed things up, and we roughly know why, but if we dig deeper. Is using CDN definitely faster than not using it? I feel a little confused. But it doesn’t matter, today we will change the angle to re-understand CDN. What is CDNFor numeric and textual data, such as information about names and phone numbers, we need a place to store them. We usually use MySQL database to store. The text is stored in mysql When we need to retrieve this data, we need to read the MySQL database. However, because MySQL data is stored on disk, for a single instance, a read performance of about 5kqps is already very good. It seems to be OK, but for a slightly larger system, it is a bit worrying. In order to improve performance, we added a layer of memory before MySQL as a cache layer, such as the commonly known redis. Reading data is first read from the memory, and only when it cannot be read, it is read from MySQL, which greatly reduces the number of times MySQL is read. With this combination of punches, the reading performance can easily reach tens of thousands of qps. mysql and redis Well, up to here, we have talked about the development scenarios that we are more likely to come into contact with in daily life. But now what I want to process is no longer the text data mentioned above, but image data. For example, I have a handsome photo. It's the one below. Every time I hear someone covering Tanya Chua's "Letting Go" on Tik Tok, I can't help but want to post this picture. And wrote "I still can't forget it". So here comes the question. Where should this image data be stored? And where should it be read from? If we look back at the scenarios of MySQL and Redis, it is nothing more than a storage layer plus a cache layer. Storage and caching layers For file objects such as images, it is unlikely that MySQL will be used for the storage layer. Instead, professional object storage should be used, such as Amazon's S3 (Amazon Simple Storage Service, note that it is called S3 because it starts with three S's) or Alibaba Cloud's OSS (Object Storage Service). In the following content, we will use the more common OSS to explain. As for the cache layer, redis can no longer be used and it needs to be replaced with CDN (Content Delivery Network). CDN can be simply understood as the cache layer corresponding to object storage. CDN and OSS Now we can answer the above question. For users, the image data is stored in the object storage and will be read from the CDN when needed. How CDN worksNow that we have CDN and object storage, let's take a look at how they work together. We can right-click and copy the URL of the pictures we usually see to view its URL. You will find that the URL of the image looks like this. https://cdn.xiaobaidebug.top/1667106197000.png The cdn.xiaobaidebug.top in front is the domain name of CDN, and the 1667106197000.png in the back is the path name of the image. When we enter this URL in the browser, an HTTP GET request will be initiated, and then the following process will be performed. CDN query process Phase 1: Your computer will first obtain the IP corresponding to the domain name cdn.xiaobaidebug.top through the DNS protocol. • Step 1 and step 2: First check the browser cache, then check the /etc/hosts cache in the operating system. If there is no cache in either, the nearest DNS server (such as the home router in your room) will be queried. If there is a corresponding cache on the nearest DNS server, a response will be returned. • Step 3: If there is no corresponding cache on the nearest DNS server, the root domain, first-level domain, second-level domain, and third-level domain servers will be queried. • step4: Then, the nearest DNS server will get the alias (CNAME) of the cdn.xiaobaidebug.top domain name, such as cdn.xiaobaidebug.top.w.kunlunaq.com. •kunlunaq.com is the DNS scheduling system dedicated to Alibaba CDN. • Step 5 to step 7: At this time, the nearest DNS server will request kunlunaq.com, and then return an IP address closest to you. The second stage: corresponds to step 8 in the above figure. The browser uses this IP to access the CDN node, and then the CDN node returns the data. In the first stage process above, many new terms are mentioned, such as CNAME, root domain, first-level domain, etc. They are described in detail in the previous article "What excellent designs in DNS are worth learning". If you don’t understand them, you can take a look. We know that the purpose of DNS is to obtain the IP address through the domain name. But that’s just one of its many functions. There are many types of DNS messages, among which the A type uses the domain name to look up the IP address corresponding to the domain name, while the CNAME type uses the domain name to look up the alias of the domain name. For ordinary domain names, the IP address corresponding to the domain name can usually be obtained directly after DNS resolution (also called A type record, A refers to Address). For example, below, I use the dig command to issue a DNS request and print the process data. $ dig + trace xiaobaidebug .top It can be seen that xiaobaidebug.top directly resolves to the corresponding IP address 47.102.221.141. But for the CDN domain name, after a wave of queries, the first thing you get is a CNAME record of xx.kunlunaq.com, and then you dig this xx.kunlunaq.com to get the corresponding IP address. $ dig + trace cdn .xiaobaidebug .top Seeing this, the problem arises again. Why is it so troublesome to add a CNAME?What CNAME points to is actually the CDN-specific DNS server. It is just a small DNS server in the entire DNS system and looks just like any other DNS server. DNS requests will also be sent to this server normally. But when the request actually hits it, its special feature is revealed. When the query request is sent to the domain name server, it is enough for the ordinary DNS domain name server to return the partial IP corresponding to the domain name, but the CDN-specific DNS domain name server will require the return of the server IP "closest" to the caller. The CDN-specific DNS resolution server will return the nearest CDN node IP How do I know which server IP is the closest to the caller?You can see that the word "recently" is actually enclosed in double quotes. The CDN-specific DNS domain name server is actually provided by the CDN provider. For example, Alibaba Cloud certainly knows which CDN nodes it has, as well as the current load status, response delay, and even weight of these CDN servers. It also knows the IP address of the caller. The caller's IP can be used to know the operator and approximate location of the caller, and the most suitable CDN server can be selected based on the conditions. This is the so-called "nearest". For example, if the CDN server closest to you has more traffic and slower response, but a server farther away can better respond to the current request, then it stands to reason that the CDN server farther away might be chosen. In other words, the selected server may not be geographically closest, but it must be the most suitable server at the moment. What is back-to-source?The image URL above is in the format of https://cdn domain name/image address.png. In other words, this picture is obtained by accessing the CDN. So, can we directly access object storage to obtain and display image data? For example, like below. https : //ossdomainname/imageaddress.png This is like asking whether it is possible to read and display text data directly from MySQL without using redis. Of course. This is what I did with the pictures I posted on my blog before. But this is more costly. The cost here can refer to performance cost or call cost. See the figure below. You can see that the cost of requesting OSS directly is almost twice that of requesting OSS through CDN. Considering my poor family background and also to make the blog get pictures faster, I connected to CDN. But seeing this, the problem arises again. In the screenshot above, there is a word called "Back to Source" in the red box. What is back-to-source? When we visit https://cdn domain name/image address.png, the request will be hit on the CDN server. But the CDN server is essentially a layer of cache, not a data source. Object storage is the data source. When you access the CDN for the first time to get a picture, there is a high probability that the CDN does not have the data for this picture, so you need to go back to the data source to get the picture data and then put it on the CDN. The next time you access the CDN, as long as the cache is not expired, you can hit the cache and return directly, so there is no need to go back to the source. So the access process becomes as follows. So in what other situations will back-to-source occur? In addition to the above-mentioned situation where the CDN cannot obtain data and will cause the server to return to the origin, the cache on the CDN may expire and cause the server to return to the origin. In addition, even if there is a cache and the cache does not expire, you can also trigger active return to the source through the open interface provided by the CDN, but we rarely have the opportunity to access this. In addition, users are actually unaware of the fact of returning to the source, because when they read the image, they can only know whether they have read it or not. The data is also read, but it is further divided into whether it is read directly from the CDN or returned after the CDN goes back to the source to read the object storage. The difference between direct return with cache and return to source without cache So, is there a way for us to determine whether a back-to-origin has occurred? Yes. Let’s continue reading. How to determine whether back-to-origin occursLet’s take the object storage and CDN of a certain cloud as an example. Suppose I want to request the following picture https://cdn.xiaobaidebug.top/image/image-20220404094549469.png In order to more conveniently view the HTTP header of the response data, we can use postman. Use the GET method to request image data. Then switch to the following tab to view the response header information. View the response header Back to source At this time, the value of X-Cache under the response header is MISS TCP_MISS. This means that the cache is not hit, causing the CDN to go back to the source to check the OSS, and then return after obtaining the data. At this point, the CDN must have this image cached. We can try to execute the GET method again to get the image. The value of X-Cache becomes HIT TCP_MEM_HIT, which means a cache hit. This is the practice of a certain cloud. Others such as Tencent Cloud are basically the same. You can almost find relevant information from the response header. Is using CDN definitely faster than not using it?Seeing this, we can answer the question at the beginning of the article. If you do not connect to CDN and directly access the source site, the process is as follows. Update direct access to the source station However, if a CDN is connected and there is no cached data on the CDN, a return to the source will be triggered. Updates go to CDN and back to source This is equivalent to adding an additional CDN calling process to the original process. That is, when using CDN, if a CDN cache miss causes a return to the source, the data will be slower than when not using it. A cache miss may mean that the data does not exist in the CDN at all, or that the data existed but later expired. Both situations are normal and most of the time no action is required. But for very few scenarios, we may need to make some optimizations. For example, if your source site data has a major version update, such as changing the CDN domain name, then at the moment of going online, all users will use the new CDN domain name to request images, and the new CDN node will basically trigger a 100% return to the source, and in serious cases it may even drag down the object storage. At this time, you may need to filter out the hot data in advance, use the tool to pre-request a wave, and let the CDN load the hot data cache. For example, the CDN on a certain cloud has such a "refresh preheat" function. cdn refresh warm up Of course, you can also use the grayscale release model to let a small number of users experience the new features first, let these users "hot up" the CDN, and then gradually release the traffic. Another possibility is that this data once existed but later expired and became invalid. For hot data, the cache time of the CDN data can be appropriately increased. When should you not use a CDN?From the above description, the biggest advantage of CDN is that for users from all over the world, it can allocate CDN nodes nearby to obtain data, and when repeatedly obtaining the same file data, it has a cache acceleration effect. This is perfect for scenarios like web pages and pictures. Because the underlying layer uses object storage, that is, as long as it is a file object, such as a video, it can be connected to the CDN for acceleration using this process. For example, the short videos on some apps that we usually watch are done this way. If you think about it the other way around, then the problem arises. When should you not use a CDN? If you have a company intranet service and the images and other files requested by the service are unlikely to be called repeatedly, there is no need to use CDN. Note the two key points in bold above.
Regarding the second point above, if you need a clear indicator to convince yourself, I can give you one. From the above introduction, we know that we can use the X-Cache field in the http header of the CDN response to see whether a request has triggered a return to the source. By counting the number of times and dividing it by the total number of requests, we can get the return to the source ratio. For example, if the return to the source ratio is as high as 90%, then why connect to the CDN? Summarize
|
<<: SA: Global 5G users exceed 1 billion, and 5G networks will cover 36% of the world's population
After ignoring electric cars in the field of new ...
Everyone is talking about the huge changes that c...
Friends who need a Hong Kong native IP host can p...
iOVZ Cloud has launched a regular promotion for M...
As an important carrier for the development of th...
In order to do a good job in network security, SD...
SDN controller features include modularity, API, ...
[[178087]] Behind the “free traffic” is actually ...
Normally, the labs we are talking about are labs ...
Today, more and more applications are causing the...
Traditional data transmission methods mostly use ...
The first large-scale IPv6 transformation was in ...
2017 will soon be a thing of the past, but there ...
RackNerd is a foreign hosting company founded in ...