In the information age, users' mobile application visits are increasing day by day. As a key link in connecting to the Internet, DNS resolution has also been put forward higher requirements. In this context, HTTPDNS domain name resolution service has gradually become the mainstream solution in the industry with its features such as anti-hijacking, precise scheduling, and real-time resolution effectiveness. We have built vivo HTTPDNS end-to-end integrated solution. By optimizing the capabilities and architecture of the four major modules of HTTPDNS SDK, HTTPDNS server, unified scheduling gateway, and full-link monitoring, we have significantly improved the access experience of end-side services and supported the efficient and stable development of services. 1. vivo HTTPDNS technical background1.1 Why build vivo HTTPDNS end-to-end integrated solution?With the rapid development of business, the number of applications accessed is increasing; users have higher and higher requirements for the access experience of mobile terminal applications. However, in the process of accessing Internet services, we may encounter hijacking of access to illegal resources; the accessed resources cannot be displayed normally; the resources are opened slowly, and the user waits for a long time; the operator's DNS resolution is not accurate, affecting the user's access experience; to solve the above problems, the more common solution in the industry is to use HTTPDNS. In this context, vivo began to explore and use HTTPDNS in 2017; however, it encountered the following problems during implementation:
1.2 How does HTTPDNS solve the core problemWe build an integrated HTTPDNS solution to solve the above core problems and help the development of the business. We mainly focus on the following aspects:
Based on the above construction ideas, let's see how we implement it specifically. 2. vivo HTTPDNS platform practice2.1 vivo HTTPDNS technical architectureThe vivo HTTPDNS platform provides an integrated HTTPDNS solution for the business. The overall architecture mainly includes four modules: HTTPDNS SDK, HTTPDNS server, unified scheduling gateway and full-link monitoring.
Based on the capabilities of the HTTPDNS platform, our core process is to integrate the HTTPDNS SDK into the vivo mobile app. The SDK first requests the HTTPDNS scheduling gateway to obtain DNS policy and other configurations; after obtaining the configuration, it initiates domain name resolution to the preferred DNS. When the preferred DNS encounters problems such as resolution failure, connection failure, domain name blocking, domain name hijacking, etc., it initiates a domain name resolution request to the alternative DNS to obtain the correct solution result, initiate a connection request, and also cache the resolution results and optimize the connection. 2.2 HTTPDNS SDK OptimizationThe SDK carries the core capabilities of our HTTPDNS integrated solution; in terms of architecture, it mainly provides support for the underlying network protocols, supports HTTP1.X, HTTP2.0 and QUIC transmission protocols, and also supports encryption protocols such as TLS1.1, TLS1.2 and TLS1.3; at the same time, it supports optimization strategies such as Session Ticket based on the TLS protocol; the service layer provides DNS resolution, DNS caching, business connection establishment, DNS policy management and other functions; the application layer provides interface, download, upload and other capabilities; and also provides full-link network monitoring. Based on the capabilities and usage scenarios of the SDK, we have identified three key optimization directions for the SDK:
2.2.1 Domain name resolution optimization First, we will introduce our exploration of resolution strategy optimization. DNS resolution is the first step for users to access vivo's Internet services. Traditional DNS resolution uses the operator's LocalDNS to resolve domain names. In this process, resolution failures or inaccurate resolution addresses often occur. To address resolution failures and inaccurate resolution addresses, we mainly divide DNS resolution optimization into the following three stages:
First, let's take a look at the retry resolution strategy. The core logic is to switch the DNS solution and retry the resolution after the DNS resolution fails. Vivo's current preferred DNS is the operator's LocalDNS. If the LocalDNS resolution fails or the resolution times out, HTTPDNS will be used to retry the resolution. This can improve the success rate of the resolution while greatly reducing the cost of using HTTPDNS. To solve the above problems, we launched an adaptive resolution strategy. The adaptive resolution strategy adds a strategy of switching DNS solutions to retry resolution and establish access in scenarios where connection establishment fails based on the retry resolution strategy. It also supports dynamic configuration of DNS policies, which can dynamically adjust the preferred and alternative DNS. The adaptive resolution strategy scenario further improves the user's access success rate. After the retry resolution strategy and adaptive resolution strategy were launched, the DNS resolution success rate increased by 2%. Can we further optimize the resolution strategy after using it? What should we do if the alternative DNS resolution or connection fails? Can we optimize the DNS resolution process for scenarios with high user access latency requirements? Based on the above problems, we have further optimized the resolution strategy. For cold start scenarios where the shorter the service access latency, the better, we use the IP priority strategy to dynamically send IP addresses to clients in advance to establish connections, optimize the DNS resolution process, and further shorten the user access latency. In the success rate improvement scenario, we use the IP backup strategy after the alternative DNS resolution fails, and dynamically send IP addresses to services to establish connections, further improving the success rate of user access. Based on the above optimization solution, the DNS resolution time in the IP priority scenario is reduced by 80%, and the business success rate in the IP backup scenario is further improved by 0.2%~0.4%. We have just introduced the optimization of the parsing strategy. Now let’s introduce our exploration of the cache strategy optimization. Caching is a core strategy to improve user experience, but if the cache is not used properly, negative effects may occur. We have also made some attempts to optimize the cache strategy.
The core strategy of the fixed cache strategy is to cache the results of DNS resolution. The next time the client resolves, it will directly use the cached address to initiate a connection, so as to achieve the optimization effect of reducing the DNS resolution delay. The cache is set with a fixed expiration time. After the cache expires, DNS resolution is performed. If the resolution is successful, it is cached again.
Based on the above problems and pain points, we implemented a dynamic cache strategy; the core strategy is to optimize cross-network cache, cache timeliness and abnormal IP cache:
Based on the above optimization strategies, the overall DNS resolution delay has dropped by more than 30%, with significant optimization effects. Based on the dynamic caching strategy, we have made further optimizations and provided an optimistic caching strategy. In general, cached results can continue to be used after they expire. For the resolution results, if the cache has expired, we optimistically judge that the cache is valid, first return the cache to the client to establish a connection, then asynchronously initiate DNS resolution and re-cache. This strategy can further reduce the client's DNS resolution latency and improve the user access experience. The above is our exploration and practice on domain name resolution strategy and domain name caching logic in domain name resolution optimization. 2.2.2 Business connection optimization Next, we will introduce the optimization of business connection establishment. First, we will introduce the network diagnosis capability. The core principle of network diagnosis is to provide network connectivity, authenticated WiFi, signal strength, system networking strategy, DNS, ping and other detection capabilities through the network connection quality detection module; diagnose whether the current user has network access, network type, network strength, whether the network is restricted and whether the accessed domain name can be pinged, and provide data support for upper-level application connection access optimization. The main usage scenarios of network diagnosis are to provide network detection functions for video playback and browser web page access; to diagnose the reasons why users have network but cannot connect; to provide user prompts and problem repairs based on the cause of the problem, and to improve the user experience of vivo mobile terminal users. The second capability of connection optimization is network speed detection. Its core principle is to collect data statistics for each request in the data reading scenario, calculate the global network speed and the network speed of a single request, so as to achieve the purpose of monitoring network quality. For example, if the access party wants to know the global speed in the past x seconds at this time point, it can reorganize the collected network quality data information in reverse order, sum the data of multiple requests in this time period in sections, calculate the data transmitted in x seconds, and divide the data by time to get the global network speed of the user in this time period. In the video-on-demand scenario, network speed detection can intelligently switch the clarity of the video playback according to the detected network speed, ensuring the smoothness of video playback and reducing the freeze rate of video playback. The third point of connection optimization is the DNS best routing strategy; the core logic is to select the best address from the IP addresses resolved by multiple DNS policies to initiate connection access. The main process is that the SDK aggregates the DNS resolution results under the operator's LocalDNS, HTTPDNS, public DNS, IP direct connection and other strategies; obtains the IP address under the corresponding network status, and combines the data with the same network ID in the historical behavior library; sorts the IPs according to the intelligent algorithm of access success rate and access time, and establishes connections for the sorted IP addresses in turn. Using the optimal routing model can improve the success rate of short video playback and browser web page opening. The fourth point of connection optimization is HTTP2 long connection optimization; connection reuse in HTTP2 can improve network performance and reduce latency, but in actual application, some shortcomings have also been found. For example, if a connection is not used for a long time, there is a certain probability that the connection will be discarded by the device in the network link, and some devices will not notify the client that the current connection has been closed according to the protocol standard. This will cause the client to reuse the connection in the next request, but at this time, because the intermediate link or the server has discarded the current link, an access timeout exception will occur. To address the above issues, we have implemented the following optimization strategies:
The core strategy of end-side intelligent prediction is to collect relevant historical data of user requests for the current life cycle, including network type, request time, request domain name, connection idle time, and abnormal information; based on historical request data, continuously narrow the data interval where reuse timeout problems may occur; then clean the collected data, discard abnormal data, and extract data features to form a data set; make a comprehensive judgment based on the relevant data in the data set to form a conclusion on whether to reuse the current connection or create a new connection. The fifth point of connection optimization is QUIC connection speed. The SDK supports QUIC connection speed. When QUIC connection speed is turned on and a user initiates access, if the QUIC speed wins, QUIC connection is used; if the HTTP speed wins, HTTP connection is used, thereby improving the success rate of end-side access. In video playback scenarios, QUIC connection speed has significantly improved playback failure rate, playback freeze rate, and slow start scenarios; in weak network scenarios, the performance advantage of the QUIC protocol is particularly obvious. The above is vivo’s exploration and practice in connection optimization strategy. 2.2.3 Unified access solution Next, we will introduce the optimization of the unified access solution. The first point is the implementation of the HTTPDNS scheduling gateway. All SDK configurations are managed and issued through the scheduling gateway, including DNS resolution strategy, cache strategy, and connection strategy. They are all managed through the configuration gateway. The SDK configuration and policy change client does not need to be re-released. The scheduling gateway greatly improves the flexibility of the SDK. The scheduling gateway is accessed through the domain name, which also avoids the situation where the IP is blocked. The second point is network framework adaptation. Vivo mobile applications use a variety of network frameworks, including OkHttp, Volley, HttpURLConnection, Glide and other network frameworks. The vivo HTTPDNS SDK has adapted these network frameworks to meet the access requirements of various businesses and reduce the cost of business access. 2.3 HTTPDNS server optimizationNext, we introduce the architecture of the vivo HTTPDNS server. The HTTPDNS server mainly provides high-performance APIs, cache libraries, proxy gateways, and other capabilities; high-performance APIs provide intelligent resolution, authentication, cache query, and other capabilities; cache libraries provide multi-level cache, lazy update, and other capabilities; proxy gateways provide EDNS, intelligent scheduling, IP detection, and other capabilities. Through these capabilities, vivo users are provided with highly available, low-latency HTTPDNS resolution services. At the same time, an HTTPDNS management backend is also provided, supporting DNS management, system management, scheduling strategy management, authentication management, access management, and other capabilities. The core capabilities of the server are mainly divided into intelligent scheduling and multi-level caching. Intelligent scheduling on the server is to obtain the resolution results of multiple partners, and cache the best results through asynchronous IP detection and other strategies. The SDK obtains the best IP address from the server for business access. Multi-level caching optimizes the cache results from synchronous refresh to automatic asynchronous refresh of the first-level cache and the second-level cache based on the TTL expiration time, which greatly improves the performance of the server and also reduces the cost of using HTTPDNS. 2.4 HTTPDNS Visual MonitoringThe vivo HTTPDNS platform provides full-link visual monitoring capabilities; it can monitor the time and requests of users from DNS resolution to the completion of the entire request; based on monitoring, it can efficiently locate anomalies in each stage of network requests; it also provides regional monitoring and anomaly warning capabilities at the provincial operator level, solving the difficulty of no monitoring of business access links. Based on regional monitoring at the provincial operator level, corresponding optimization plans can be formulated for network access environments in different regions, and the early warning capability can detect anomalies in a timely manner and optimize them in a timely manner. 2.5 HTTPDNS Business EffectAfter the above optimization practices, as of now, the vivo HTTPDNS platform has covered more than 100 services of vivo mobile phones, and the number of HTTPDNS resolutions has reached 1.5 billion times per day; the client's resolution delay has dropped from an average of 180ms to 115ms, a decrease of 36%, with significant optimization effects; the server-side resolution success rate has reached 99.5%, providing stable and reliable resolution services for the business; the server-side response time is about 4ms, reaching the industry-leading level; the server-side cache hit rate has reached 90%, reducing the cost of HTTPDNS while shortening the response time of DNS resolution. In terms of success rate improvement, the DNS resolution success rate increased from 97% before optimization to 99.85% after optimization, basically solving all DNS-related problems; the client access success rate also increased from 97% before optimization to 99% after optimization. The optimization effect is significant. After optimization, the user experience of vivo terminal applications has been significantly improved. In terms of anti-hijacking, the iMusic domain name was hijacked in a certain region in February 2023. Through monitoring, it was found that the domain name was hijacked to a foreign address; vivo HTTPDNS platform monitoring found that the domain name was resolved normally using HTTPDNS, and the domain name success rate and connectivity rate were normal; CDN monitoring business traffic was normal and no abnormalities were found. In terms of accidental blocking of domain names, in April 2023, the .com.cn root domain was mistakenly blocked in a certain region's telecommunications and mobile networks, and the DNS resolution address returned 127.0.0.1; vivo browser, short video and other services connected to the vivo HTTPDNS platform were not affected, which improved the availability, brand image and reputation of vivo services. The above is the exploration and practice of the vivo HTTPDNS platform, as well as its specific performance after business access. 3. Summary and Outlook of vivo HTTPDNS3.1 Summary of vivo HTTPDNS constructionThere are many practices of the vivo HTTPDNS platform that are not described in detail, and business optimization will continue to be built.
3.2 Future Prospects of vivo HTTPDNSFinally, here are some of our outlooks for the future. In the future, we will continue to focus on cutting-edge technologies for device-side optimization and explore multi-channel acceleration and device-side scheduling optimization solutions in terms of network acceleration. In terms of multi-channel acceleration, explore dual mobile network, dual WiFi or mobile network plus WiFi acceleration solutions. In terms of terminal-side scheduling, we use terminal-side proximity scheduling, intelligent addressing, and dedicated low-latency network acceleration solutions to further improve the user experience on the terminal side. Experience optimization is the core embodiment of vivo's user orientation. Improving the user access success rate requires our continuous investment and continuous optimization. We will continue to explore new technologies and solutions for experience optimization together with the industry. |
Today we will analyze the HTTP protocol, which is...
How about DogYun? I just had a little bit of prep...
Weibo and WeChat are two well-known social platfo...
[51CTO.com original article] The author has joine...
Most of the discussion about 5G has centered arou...
Since the beginning of the year, the central gove...
Broadband includes those from China Telecom, Chin...
We are now in the third year of “The Year of 5G.”...
Since the coronavirus crisis, fast internet has b...
According to some users, in order to improve the ...
[Barcelona, Spain, February 26, 2024] During MW...
The situation is tense and there is little conten...
HostXen was founded in 2014 and provides cloud se...
[[442942]] In today's article, we are not goi...
The Lunar New Year is approaching, and Tencent Cl...