BackgroundToo many CDN domain names cause request fragmentation, leading to the following problems: TCP connection establishment is frequent and network request performance is poorThe network connection pool resources used to request CDN static resources are limited. Different domain names will create their own TCP connections, which will compete for TCP connection pool resources, resulting in frequent TCP connection interruptions. Re-initiating a network request requires re-establishing a TCP connection, which increases the time spent on the connection establishment phase (including: DNS resolution, TCP handshake, TLS handshake), resulting in an increase in the total time spent. Too many domain names, high daily maintenance costsToo many domain names increase the complexity of domain name management, performance monitoring, performance optimization, and online changes, and increase labor and operation and maintenance costs. For example, the Dewu IPv6 upgrade project and the TLS1.3 protocol upgrade project both require multiple online change processes to be executed in batches by domain name (including: test regression, change application, change review, change verification, and performance monitoring). Some domain names are not named in a standardized manner and may be at risk of being taken offlineDue to historical reasons, there are many domain names that do not conform to the existing new domain name specifications (such as: xxx.poizon.com, xxx.dewu.com). Domain names that are not owned by Dewu are at risk of being forced offline. For example, the old domain name offline project has invested a lot of manpower costs for transformation. DNS resolution of IP addresses is frequent, and Alibaba Cloud HttpDNS service costs are highToo many domain names increases the frequency of calling Alibaba Cloud HttpDNS to resolve the corresponding IP address when establishing a TCP connection for each domain name. The increased number of resolutions leads to high HttpDNS service costs. In order to solve the problem caused by too many CDN domain names, we decided to perform CDN domain name convergence optimization on the client side. 2. CDN domain name convergence2.1 Convergence ThoughtsLet's first think about a question: How can we converge multiple CDN domain names into a unified domain name on the client side without affecting the business? The following points should be considered:
Let's first look at an example of a client requesting static resources through a CDN service before CDN domain name convergence: The process of a static resource URL request to the origin site can be roughly divided into three stages:
The second section of the public network in the red box in the above figure: Client network library -> CDN service. This section is a one-to-one relationship. To achieve the convergence of multiple CDN domain names into a unified domain name, the main thing is to transform this section into a single one-to-one relationship. Therefore, just transform the first section into many-to-one and the third section into one-to-many. In simple terms, it is how the client side converges the domain name and how the CDN service side distributes it back to the source station. 2.2 Client-side convergence domain nameWhen the client-side network library is initialized, a network request interceptor is inserted in advance to intercept network requests and then perform unified processing at the bottom layer, replacing each domain name with a unified domain name. This avoids intrusion into the business layer code and enables unified domain name convergence at the bottom layer without the perception of the upper business layer. Taking the Android version of Dewu as an example, the network library uses OkHttp3, which supports adding custom interceptors. We can add interceptors for CDN domain name convergence to all OkHttpClient objects. Inserting interceptors through ASpectJ plug-in does not invade business code, and can also ensure that newly created OkHttpClient objects are automatically inserted with interceptors. // ASpectJ instrumentation @Aspectpublic class OkHttpAspect { When the client-side business upper layer initiates a network request for a static resource URL, the network library interceptor intercepts the request and executes the CDN domain name convergence logic. Each domain name is replaced with a unified domain name, and the original domain name information is carried by inserting a path prefix representing each domain name (which is mapped one-to-one with each domain name). One-to-one mapping table between original domain name and path prefix
Example of generating a new URL after the original URL of the image.xxx.com domain is intercepted First replace the image.xxx.com domain name with the unified domain name cdn.xxx.com, and then insert the path prefix /image to generate a new URL. The comparison between the original URL and the new URL is as follows: https://image.xxx.com/xxx.jpg is replaced by https://cdn.xxx.com/image/xxx.jpg The specific implementation code is as follows: /** * Converge domain name logic and map domain name to path* * @param urlStr original url * @param sourceHost original domain name* @param targetHost unified domain name* @param pathPrefix path prefix mapped to original domain name* @return concatenated new url */ public static String replaceMergeHostUrl ( String urlStr , String sourceHost , String targetHost , String pathPrefix ) The network library uses the new URL to request the CDN service. If the CDN service node does not have a resource cache for the new URL or the cache has expired, the CDN service side distribution source station logic is triggered. 2.3 CDN service side distribution source station
Solution 1: CDN edge script redirection CDN edge script Alibaba Cloud official document: https://help.aliyun.com/document_detail/126588.html Write CDN edge scripts and deploy them on CDN service nodes to support restoring and forwarding requests to the source OSS by redirecting them. Solution 2: Alibaba Cloud OSS image back to source Image back-to-source Alibaba Cloud official document: https://help.aliyun.com/document_detail/409627.html Configure the image return rule for the unified OSS. When the unified OSS does not have static resources, a 404 error occurs, triggering the image return to the source OSS. After successfully pulling resources from the source station, a copy is stored in the unified OSS. The next time you access it, the unified OSS will directly return the stored resource copy and will no longer trigger the image return. Schematic diagram of image back-to-source principle (from Alibaba Cloud official documentation) The two solutions have their own advantages and disadvantages. The specific comparison is shown in the following table:
Taking into account both the transformation cost and performance impact, we finally chose "Alibaba Cloud OSS Mirror Back to Source" as the CDN service-side distribution source station solution.
Match the path prefix in the unified OSS image back-to-source configuration rules, and map and restore to the source OSS corresponding to the original domain name to achieve accurate image back-to-source to the source OSS. The one-to-one mapping table between Path prefixes and source OSS is as follows:
For example, the example diagram of OSS mirror back-to-origin configuration for the domain name community.xxx.com After achieving client-side convergence and CDN service-side distribution of the origin server, let's take a look at the example diagram of a client requesting static resources through the CDN service: The red box on the left shows an example of client-side convergence completing many-to-one transformation, and the red box on the right shows an example of service-side OSS image back-to-source completing one-to-many transformation. The architecture has basically achieved the goal of CDN domain name convergence. However, we still need to consider how to ensure the stability of the function launch phase and the flexibility of domain name convergence after launch. 2.4 Stable and flexible expansion during the launch phaseEnsure stability by supporting overall grayscale functionality and monitoring logs Configure the AB experiment switch as the client function grayscale switch, support percentage-based volume expansion, control the grayscale ratio of the client CDN domain name convergence function, and ensure the stability of function launch.
An example of how the client can grayscale the CDN domain name convergence function through AB experiments: Logs (sampling is supported) are embedded in the key code logic of the CDN domain name convergence function and reported to Alibaba Cloud Log Service (SLS). Monitoring alarms are configured on the SLS platform to facilitate timely detection and processing of online anomalies. Definition of code-level monitoring points for CDN domain name convergence function
Ensure flexibility by supporting the separate release of domain names to be converged The client configuration center sends configuration data of the list of domain names to be converged, and supports dynamic delivery and gradual increase of domain names to be converged based on the single domain name dimension. { Now, let's look at the flow chart of the client requesting static resources after CDN domain name convergence: Grayscale release process in the development, testing, and release stagesAB experiment volume expansion process
The process of converging and increasing volume in a single domain name dimension
3. CDN multi-vendor disaster recoveryAs an e-commerce platform, Dewu needs to provide users with stable and reliable CDN services whether it is a major promotion event (such as 618, Chinese Valentine's Day, Double Eleven, Double Twelve, etc.) or daily services. Alibaba Cloud CDN service SLA only supports 99.9%, which means there is a risk of 43 minutes of online service unavailability per month. Alibaba Cloud CDN service SLA official document: http://terms.aliyun.com/legal-agreement/terms/suit_bu1_ali_cloud/suit_bu1_ali_cloud201803050950_21147.html?spm=a2c4g.11186623.0.0.7af85fcey4BKBZ Therefore, after converging multiple CDN domain names into a unified domain name, we decided to upgrade the unified domain name to CDN disaster recovery for the same vendor and multiple vendors. 3.1 Disaster Recovery IdeaThe following points are mainly considered:
3.2 Dynamically send domain name listThe client supports configurable CDN domain names. The domain name used when loading static resources is selected according to the domain name list issued by the configuration center. The priority strategy for selecting domain names is based on the order of the domain names in the domain name list. By default, the first available domain name in the domain name list is used as the current domain name. Other domain names of the same manufacturer or multiple manufacturers can also be configured in the list as backup domain names. Domain name switching priority: primary domain name -> backup domain name of the same manufacturer -> backup domain name of multiple manufacturers.
3.3 Automatic Disaster Recovery Downgrade of Domain NameWhen the client business layer initiates a network request for a static resource URL, the domain name replacement logic is executed in the network library CDN domain name convergence interceptor, the domain name of the original URL is replaced with the current domain name selected from the domain name list, and a network request for loading static resources is initiated. Monitor the abnormal callback of the current domain name request. If the current domain name is determined to be unavailable, update the availability status field isDisabled of the current CDN domain name and assign it to true (true for unavailable and false for available (default)). The conditions for determining that the current domain name is unavailable are as follows: The HTTP protocol status code returns 5XX (500<=code and code <600) The socket connection times out/failed, and the client network status is normal (check the client network reasons). Method for judging the client network status: Use the InetAddress.isReachable function (Android) or the ping command (such as ping -c 1 223.5.5.5) to detect that at least one of the domestic public DNS services (configured and sent IP lists, such as 223.5.5.5, 180.76.76.76) can be pinged successfully. If the ping succeeds, it means the client network is normal; if the ping fails, it means the client network is abnormal.
If the current CDN domain name is unavailable, traverse the CDN domain name list and obtain the next available CDN domain name as the new current CDN domain name. Replace the URL domain name with the new current domain name again, initiate a static resource network request, and implement automatic disaster recovery and degradation of the end-side domain name to support disaster recovery of the same manufacturer and multiple manufacturers. If all domain names in the CDN domain name list are unavailable, the fallback logic is executed to restore the original URL before the CDN domain name convergence and initiate the request again. Flowchart of automatic domain name disaster recovery and downgrade: 3.4 Automatic recovery of domain availabilityFor CDN domain names whose isDisabled field in the domain name list status field has been assigned a value of true, after the CDN domain name is unblocked and the CDN server failure is restored, the client side needs to promptly perceive and automatically restore the CDN domain name availability to ensure that the client network request traffic is gradually switched back to the primary domain name from the low-priority backup domain name. The specific steps for restoring domain name availability are as follows:
Flowchart of automatic recovery of domain name availability: 4. Challenges encountered4.1 Scenarios where a small number of resources are dynamically updated
The static resources (pictures, videos, zip files) requested by the client through the CDN service generally do not update the file content (the URL remains unchanged, and the file is overwritten and updated). However, in some business scenarios, the resources requested are json files, and the json file content will be dynamically updated. For example, when the client configuration center platform publishes configuration data, the json file will be overwritten and updated. We have already introduced the working principle of OSS image back-to-source above. After pulling resources from the source OSS, the unified OSS will store a copy. The next time you access it, it will directly return the stored resource copy and will no longer trigger image back-to-source. Therefore, such dynamic updates of resources require separate compatibility processing.
Sort out all business scenarios where resource content is dynamically updated (enumerable) and promote OSS dual-write compatibility transformation. When resource content needs to be dynamically updated, synchronize the source OSS and the unified OSS to ensure that the file content of the resources on the unified OSS is the latest. 4.2 CDN service console monitoring does not support Path dimension monitoring
After the client completes CDN domain name convergence, multiple original domain names are converged into a unified domain name, and the original domain names are distinguished by the Path prefix. The business side expects that after the domain name convergence, it can support viewing performance monitoring reports by the Path dimension, but the Alibaba Cloud CDN service console currently only supports domain name dimension monitoring. Therefore, it is necessary to consider self-developed Path dimension monitoring reports.
The client network monitoring platform supports Path dimension monitoring indicators, including: number of requests, number of return to the source, traffic, bandwidth, and cost. Use the Get Path Dimension Monitoring Data API provided by Alibaba Cloud CDN to query data, including the number of requests and traffic. Use the popular URL back-to-source API provided by Alibaba Cloud CDN to query the number of back-to-source times for popular URLs, and then count the number of back-to-source times in the Path dimension. Example of a Path dimension monitoring report: 4.3 How to check the newly applied CDN domain name and OSS in the future?
Although domain name convergence and mirroring back to the source have been done for the existing multiple domain names and multiple source OSS, we still need to consider how to ensure that no new CDN domain names are added on the Dewu App side.
After communicating with the operation and maintenance colleagues, an approval step was added to the application process for new domain names and new OSS to implement the checkpoint. 4.4 Configuration data is also sent through the CDN domain name, which may cause unavailability risks.
Since configuration data is also issued through the CDN domain name, when the domain name is unavailable, the client cannot pull the latest configuration data (such as the domain name list newly configured by the configuration platform) through the configuration center SDK, and the client resource request failure cannot be restored in time by adjusting the configuration data.
To ensure the timeliness and reliability of configuration data, we can add a dedicated API interface to obtain configuration data as a backup. When the App is cold started or all domain names in the CDN domain name list are unavailable, the dedicated API interface is sent asynchronously to request configuration data.
Since the configuration center SDK and the dedicated API interface both obtain configuration data from the same source and are independent of each other, when using configuration data, the client needs to compare the configuration data from the two data sources and use the latest configuration data. How to achieve this? We add a configVersion field to represent the version of the configuration data. Each time the configuration data is updated, the configVersion is incremented by +1. The larger the version number, the newer the data. The client can determine which configuration data is the latest data by judging the size of the configVersion in the configuration data of the two data sources and use it.
5. Effect after going onlineCompleted the convergence of 8 CDN domain names into a unified primary domain name, and supported disaster recovery capabilities for CDN domain names of the same manufacturer and multiple manufacturers. Improved network request performance
Reduced network request exception rateAfter the CDN domain name convergence function was launched, the TCP connection reuse rate was significantly improved, the number of exceptions caused by DNS resolution failure and TCP connection failure was significantly reduced, and the exception rates on both ends were significantly reduced.
Improved stability
It supports Alibaba Cloud & Tencent Cloud multi-vendor disaster recovery capabilities. When the user network is normal, if the request for resources to Alibaba Cloud CDN service fails (http code 5xx, or socket failure), it will automatically retry and switch to the backup domain name to request resources through Tencent Cloud CDN, and the SLA is improved from 99.9% for a single vendor to 99.99%+. HTTPDNS cost reductionAlibaba Cloud HttpDNS service costs reduced by 24% 6. PitfallsDuring the grayscale mass production phase, internal colleagues reported that some identification images on the identification page were blurred. Cause: OSS image back-to-source rule configuration problem, "Back-to-source parameters: carry request string" was checked. This means that when the unified OSS is mirroring back to the source, the request parameter after "?" will be carried to the source OSS, and the source OSS will return the cropped thumbnail to the unified OSS according to the cropping parameters, rather than the original image. When the client requests again, the unified OSS will use the thumbnail as the original image, and then perform a second cropping according to the image cropping parameters, resulting in a very small image size returned to the client, which will be blurred after being stretched by the View. It should be noted that the closer the cropping parameters carried by the first request of the entire network are to the original image, the smaller the impact on the clarity of the image; the smaller the cropping parameters, the larger the cropping parameters of the second request, the greater the degree of stretching of the image, and the blurrier the image. The first request sample URL of the whole network: https://cdn.xxx.com/image-cdn/app/xxx/identify/du_android_w1160_h2062.jpeg?x-oss-process=image//resize,m_lfit,w_260,h_470 The second request sample URL: https://cdn.xxx.com/image-cdn/app/xxx/identify/du_android_w1160_h2062.jpeg?x-oss-process=image//resize,m_lfit,w_760,h_1500 As shown in the example URL, the original image is a picture with a width of 1160 and a height of 2062. Due to the need for display in the client View (width 260, height 470), the Alibaba Cloud image cropping parameters "x-oss-process=image//resize,m_lfit,w_260,h_470" are spliced. After hitting the CDN domain name to converge the grayscale, the entire network uses the new URL replaced with the unified domain name to request the image for the first time. The CDN service does not have this URL cache, and the source is returned to the unified OSS. The unified OSS triggers the image back to the source and carries the request parameters to the source OSS. The source OSS will return a thumbnail with a width of 259 and a height of 473 to the unified OSS based on the image cropping parameters in the request parameters. The unified OSS stores the thumbnail as a copy of the original image. When the second request is made, the unified OSS crops the image according to the cropping parameters of the second request, width 760 and height 1500. However, the width and height of the original copy are smaller than the cropping parameters (width 260 < 760, height 470 < 1500), and finally returns a thumbnail with a width of 260 and a height of 470 to the client. The client View (width 760, height 1500) pulls the resized thumbnail for display, resulting in a blurry image. During the testing phase, we only verified whether the images were returned and displayed normally. We did not pay attention to the problem that mirroring with image parameters back to the source would cause blurry images. Solution:Close the grayscale volume configuration of image.xxx.com. Reconfigure the OSS image back-to-source rule and uncheck "Back-to-source parameters: carry request string". Use the new path prefix as the mapping to ensure that the original image is correctly pulled when the image back-to-source is re-triggered. Carry out test verification, and then increase the volume again after confirming that there is no image blur problem. Delete the incorrect thumbnail images under /image-cdn to avoid wasting OSS storage costs. 7. SummaryThrough CDN domain name convergence, we not only achieved improvements in CDN network performance and stability, but also achieved the unification and standardization of multiple domain names, greatly reducing the complexity of subsequent CDN domain name optimization and maintenance. In addition, it also supports the disaster recovery capability of the CDN main domain name, ensuring the stability of online services. |
<<: Advantages of Web 3.0 in Business Models
>>: Why can't I decrypt with the public key when I encrypt with the public key?
[Editor's Recommendation] 5G security standar...
In order to promote the further development of Ci...
Modernity brought new and groundbreaking things t...
1. What is WonderShaper WonderShaper is a tool fo...
In the golden month of October just past, the 7th...
According to the website of China Internet Networ...
Compared to Italy, Austria's 5G sales look li...
Speaking of positioning, I believe everyone will ...
Servmix is a foreign hosting company founded in...
01/ Introduction Edge computing is a computing pa...
The arrival of 5G has brought with it an unpreced...
JustVPS.pro bought a VPS in London, UK, last Dece...
[51CTO.com original article] In the golden autumn...
Tribe once shared information about Edgevirt in J...