Business BackgroundBilibili is a video community based on PUGV, and the main scenario for users is to watch videos on the video details page. As the business grows, there will be more and more extended businesses on this "main battlefield", such as topics, video honors, notes, user costumes, etc. picture (Figure 1: All traffic will be aggregated to the video details page) As can be seen from Figure 1, we can divide the functional pages on the APP into two categories: ListView Page, such as recommendation, search, dynamic, partition, etc. Most of the pages are list-type, which provides users with rich content filtering and preview scenarios; the other is the detail page (DetailView Page). When users click on the content they are interested in on any list page, they will be directed to the detail page for viewing. picture (Figure 2: The video details page gathers various information and function entrances related to the video) As shown in Figure 2, the video details page gathers the attributes and function portals related to the video, such as: popular, site-wide rankings, weekly must-sees and other manuscript honors, video shooting templates, video collections, video soundtracks, and related topics, etc. This information and portals can help users further explore related subject content and functions. Current situation and problemsIn terms of technical implementation, the user-oriented application architecture of Station B is mainly divided into four layers:
picture (Figure 3: Application architecture layering) As shown in Figure 3, the main logic of the video details page is concentrated in the BFF layer. With the growth of DAU and the continuous expansion of business, we are faced with two problems: Problem 1: As the business expands, the number of fanout reads increases, which brings huge traffic load and complexity to BFF itself and downstream businesses. As shown in the figure below, in order to display the functional entry of associated videos, the business service needs to carry the traffic of all video detail requests and the resulting CPU resource consumption; on the other hand, it also needs to implement a bloom filter mechanism to avoid a large number of back-source queries caused by all unassociated video requests. picture (Figure 4: The load is indiscriminately amplified to all services as the BFF reads, which complicates the implementation of the services) Problem 2: Perhaps we can solve problem 1 by adding machines and increasing the complexity of the implementation, but as the number of fanout reads continues to increase, the latency of a single video detail request will continue to deteriorate until it becomes unacceptable to users. (Figure 4.a [Reference 1], an increase in the number of fanouts will greatly increase the probability of overall request timeouts. Figure 4.b is the fanout request topology of the actual Bilibili APP video detail BFF, which is already quite large (the figure is no longer clear), and the number of fanouts continues to increase as the business increases.) picture (Figure 4.a Correlation between fanout number and timeout rate, excerpted from "The Tail At Scale") picture
Analysis and ModelingAs mentioned above, many downstream business services of video details only cover some videos, that is, only some videos have associated data, so a BloomFilter-like mechanism is often used to filter requests for unassociated videos. We bucketed and counted the size of the downstream response of the video detail BFF request (using Prometheus Histogram). After analysis, we found that the responses returned by many business services showed the distribution shown in the following figure: picture (Figure 5: Distribution of packet sizes returned by BFF requesting a service) It can be seen that more than 90% of the responses of many service interfaces accessed by BFF are "empty", which means that the requested video is not associated with the service. However, in practice, the video detail BFF will request these services every time it obtains video detail information. The fundamental reason is that the BFF layer does not know which services the video is associated with when processing the request. If we can know in advance at the BFF layer which services are associated with the video being requested, we can significantly reduce the number of read diffusions of the BFF and the load on the business service, and achieve on-demand access. We can create a sparse vector containing the associated services for each video, called a video-service index, as shown in the following figure: picture (Figure 6: Index model of video ID and associated services) In actual implementation, the video service index does not necessarily store the relationship between the video and the service in the form of sparse vectors. Some existing KV systems can be used. For example, we use the hash key of redis to implement it. Another thing to consider is that when the relationship between the service and the video changes, there needs to be a mechanism to notify the index service of the change in full (initial stage) and incrementally. accomplishBased on the previous problem analysis and modeling, we optimized the architecture of the video details BFF as shown in the following figure: picture (Figure 7: Optimized architecture and processing flow) In the BFF request processing flow, ① the business-related index service is introduced. Before BFF requests the downstream business service, the index of the video-related business is obtained. ② The business services that should be accessed by this request are obtained in advance to filter out irrelevant business requests. The index is implemented through the hashmap of redis, and the company's internal KV storage is also used for persistence and redis fault degradation. The key setting example of redis is as follows: The index of video-related services is constructed by importing all and incremental related information of downstream services. In order to facilitate downstream services to import heterogeneous data into indexes more efficiently, we provide a backend system that supports online business change message cleaning and import function writing. As shown in the following figure: picture (Figure 8: Business change event processing function and index update push backend) Schema extensionAfter further investigation, we found that not only video details, but also Story (short video), live broadcast, dynamic and my page details pages all present similar aggregation scenarios, and as shown in Figure 3, these aggregation scenarios will also appear in the BFFs corresponding to multiple terminals such as APP, TV, and Web. Is it possible to use a more standard and universal solution to uniformly solve the aggregation problem of similar video details? As shown in Figure 3 above, the main processing logic of BFF is divided into: parameter processing, aggregation logic, and assembly of return objects (VO). We can abstract complex aggregation logic such as video, live broadcast, and user into a more general aggregation service and provide it to all BFFs. To achieve this, the general aggregation service needs to have the following capabilities:
Regarding point 1, common practices in the industry include the following:
The following is our View Enum definition for the video details page: Regarding the second point, we abstract the aggregation logic into a DAG graph. The reason for using the DAG model is that some business services may have dependencies between each other. For example, the attributes of some videos depend on the video author information in the basic video information (obtained by accessing the basic video information service). In this way, any new business only needs to: 1. Specify other nodes to depend on, 2. Write the logic within the node, including accessing the service and business logic processing, and 3. Configure which View Enums the node should be used in. Regarding point 3, the implementation principle has been introduced before. We only need to expand the index from video-service index to live broadcast and user-service index. In summary, we name the general data aggregation service DAGW (Data Aggregate Gateway). The internal structure of DAGW and its interaction with the BFF layer and Service are shown in the following figure: picture (Figure 9: Introducing the universal data aggregation gateway layer DAGW to uniformly meet the needs of aggregation scenarios) EffectAfter the launch of DAGW universal data aggregation gateway and business association index, it supports the aggregation of video, user and other information. Nearly 30 business services have been connected and helped reduce the traffic and load of business services by more than 90% on average. The following is the effect of the access of the video's high-energy viewing business and the user's fan medal business: 1. In the traffic of the video's high-energy highlight service, the traffic from the playback page (app-view) reaches 100k+ QPS during the peak period. After connecting to DAGW for optimization, the effect is very significant. The monitoring in the following figure shows that the request QPS has been reduced by 99%. picture 2. The fan badge is a wearable hardcore fan honor that users obtain by watching the anchor's live broadcast for a long time and participating in the interaction. Because the threshold for obtaining it is high and it is only displayed under specific anchor content, after connecting to DAGW, it can effectively reduce more than 85% of the access traffic. refer to1. The Tail at Scale: https://research.google/pubs/pub40801/ 2. GraphQL: From Excitement to Deception: https://betterprogramming.pub/graphql-from-excitement-to-deception-f81f7c95b7cf 3. View Enum: https://google.aip.dev/157 Authors of this issue Huang Shancheng Senior Development Engineer at Bilibili Xia Linjuan Senior Development Engineer at Bilibili Zhao Dandan Senior Development Engineer at Bilibili |
<<: What is Fiber to the Home (FTTH)?
How does 5G fit into this? As remote work, video ...
It has been more than a year since I shared infor...
Wireless communication is closely related to our ...
OneTechCloud has officially launched a promotiona...
Choosing an office suite used to be a simple matt...
[51CTO.com original article] In the past, users o...
Recently, the 14th Intel Internet of Things Summi...
DMIT.io has just restocked several special annual...
What is UDP? UDP is the abbreviation of User Data...
Today I would like to introduce to you Aruba laun...
Continuing from the previous article "Easy...
At this early stage, 6G wide-area wireless has fe...
[[320662]] Recently, new infrastructure has conti...
Telecom operator Telenor has officially launched ...
【Attention】This merchant has run away!!! Limewave...