Basic logical architecture of the crowd selection system Crowd selection is not a new business. Almost all Internet companies are doing it because it is a basic marketing scenario. To select the right people, you need to issue coupons, import traffic, and conduct targeted promotions. How do you do this? In fact, you need to divide the crowd into different groups based on their behavioral characteristics, purchasing characteristics, attention characteristics, interest characteristics, and even education level. By dividing the crowd into groups, you can allocate resources to the people with the highest conversion rate or click-through rate within a limited marketing budget. So what are the technical requirements behind this business need? The core of the typical crowd business section is insight analysis. The general scenario of insight analysis is to support tens of thousands of different advertisers. They will freely choose the crowd they are more interested in on the platform. Each advertiser has different demands for the crowd. Some people focus on purchasing power, while others focus on collection behavior. Every day, tens of thousands of advertisers send millions of query requests and build tens of thousands of various crowd packages. The computational complexity of each system is very high. The core demands here include several points, millisecond-level insights, because all queries are expected to be interactive. Every interaction on the interface, every drop-down menu, every selection, and every condition combination, you want to see an interactive result, whether the entire crowd is getting bigger or smaller, and whether the target crowd is similar to the expectation. This has very high performance requirements. At the same time, I remind everyone that the data must be desensitized to protect the user's personal privacy. All analysis is based on compliant data. The core demands of the crowd selection system service engine Interactive analysis performance on large-scale data Tens of thousands of advertisers submit millions of data queries, which require millisecond-level responses. Fast query is a must. This fast has a qualifier, which is scale data. Millions of data are not considered scale. Behavior logs are very large, and we hope that they are more than 10 billion. We still have a good interactive analysis capability and can respond in seconds. Flexible screening capabilities Users have various filtering behaviors, such as equal value comparison, value range comparison, time range comparison, etc. Various filtering conditions can be flexibly combined, and expressing these filtering results reflects the capabilities of the computing engine. High throughput update capability User tags are not static. Everything is real-time and online now. All behavior data changes are expected to be triggered in real time and provide real-time feedback for the next system decision. For example, what products are placed in the latest favorites? Can this behavior become part of the online portrait? Therefore, the requirements for high real-time throughput are very high. From a computational perspective, it can be divided into several computational modes as shown in the figure below. Tag filtering is divided into equal value filtering, which can be performed using Equal/In/Between. These filters can be operated on the level of tens of billions. After the operation, the result set needs to do a lot of intersection and union. For a common example, a user follows both the competitor brand and the company's products, but does not buy them. There are actually union relationships, difference relationships, and intersection relationships. Therefore, these population relationships must be combined, and there is a high degree of intersection and union calculation. Finally, there is a strong demand for accurate deduplication, because the calculation result must eventually be turned into an ID that uniquely locates the user, and this ID will be used for advertising delivery. These requirements, at the engine level, are about how efficient data reading is. If row storage is used for reading, will there be an IO amplification problem? Data is stored by row, and the actual filtering is based on a certain column, but IO reading will read the entire row, which will cause IO amplification problems. Column storage will also have index problems and filtering effect problems. When the calculation operator is connected to the table, is the Hash JOIN method or the Nest Loop JOIN method used? How effective is the accurate deduplication? These are all very high requirements for the efficiency of the computing engine. So the essence is to solve the problems of efficient data storage and filtering, relational operation memory/CPU consumption, and accurate deduplication memory/CPU consumption. There are many different optimization solutions here, such as using more memory or CPU. There are two general ideas in the industry. One is through the idea of pre-settlement, there are technologies such as Kylin/Druid. These technologies can perform advance pre-processing on some predefined dimensions. After pre-processing, the data set will be essentially reduced. For example, to find a user group, they pay attention to the first product but not the second product. Each result set can be expressed by a bitmap array, and the efficiency of intersection and union between arrays is very high. Pre-computation technology actually has great benefits in calculating precise deduplication and intersection and union. But the defects are also obvious. The biggest defect is inflexibility, and the complete SQL expression ability is also relatively weak. The other is MPP distributed database technology, some of which provide better query performance through column storage, distribution, and indexing. Therefore, when implementing a crowd screening solution, it is generally not necessary to choose only one solution, because both the pre-calculation solution and the MPP solution have some essential defects. So which technologies on the market are more suitable for storage and query? The first type of technology is the transactional database that everyone is familiar with. The transactional database is a row-based storage system. The efficiency of writing a single row of data into the storage is very high. It is used for querying and filtering statistics. When the number of rows exceeds 10 million, it is found that the consumption of resources is very large. Therefore, TP systems are generally not used for direct analysis operations. The second type of system, the AP system, is a common OLAP system. This type of system is optimized for large-scale data scanning scenarios, including the use of distributed technology, column storage technology, compression technology, index technology, etc. This type of technology is very fast to search, but the essential defect is that most systems are not very friendly in terms of update. Because the data is searched quickly, the data should be compact and compressed, so the update capability is weaker. There is another type of system, which is also common in big data analysis. We call it a Serving system, a type of system that supports online business. This type of system searches fast enough, but sacrifices the flexibility of query. For example, the query method of document databases and KeyValue systems has great limitations, and can only be queried according to its key. This reduces flexibility, but the performance is infinitely magnified because it can be horizontally expanded, because the key is relatively the most efficient to access, and the update efficiency is also very high. According to the key update, the entire record can be replaced. In the past, we had to split the data into TP, AP, and Serving for different scenarios, and the data was transferred back and forth between several systems. This makes us more dependent on the entire system. As long as there is a dependency on the data, there will be data inconsistency. Data inconsistency means data correction, and the data development cost becomes higher. Therefore, everyone will innovate in many fields. The first type of innovation is to create a mixed load capacity in the TP and AP fields. Try to solve these two scenarios with one technology. It supports transactions and analysis. I hope that one day in the future this system will be truly well implemented. This type of system also has certain limitations. To support transaction operations, various distributed lock overheads are still indispensable. Because this type of system has some capabilities, the overhead in terms of concurrency and performance is relatively large, so there are certain performance bottlenecks. Some innovations can also be made on the left side of the figure below. The biggest problem with the innovation on the left side is that it does not support transactions. Weaken the transaction capability, do not need so many transactions, and hope to check and update quickly enough. So this is where it is possible to make technical innovations. This technology has both good and flexible analysis capabilities, good data writing capabilities, and complete SQL expression capabilities. Therefore, the technology in the intersection part on the left is very suitable for the three technical requirements mentioned just now. This is the product Hologres that I will share today. Hologres = vectorized SQL engine + flexible multi-dimensional analysis + high throughput real-time update Hologres, a one-stop real-time data warehouse, provides real-time analysis (OLAP) and online services (point query) capabilities. It is seamlessly connected with MaxCompute to achieve a set of architectures and coexistence of multiple loads (OLAP, online services, interactive analysis), reducing data silos, avoiding data fragmentation, simplifying links, and improving user experience. Unified Storage One piece of data supports multiple loads (OLAP, online services, MaxCompute interactive analysis), reducing data fragmentation, eliminating data silos, and eliminating frequent data import and export, improving data development efficiency and simplifying links Unified Interface The interface is compatible with the open source Postgres protocol, supports mainstream development and BI tools, does not require application layer rewriting, and is open to the ecosystem. It uses SQL to describe multiple scenarios, improving the efficiency of data application development. It unifies the data model and describes the data warehouse model through "tables" with consistent semantics. Real-time offline integration Supports real-time writing, real-time updating, and query-as-you-write, with native integration of Flink high performance The performance of OLAP scenarios is better than Clickhouse, Impala, and Presto, supporting sub-second response and high QPS Hologres: One-stop real-time data warehouse Why can Hologres support high performance and high throughput writing? In fact, there is nothing mysterious about it. Hologres relies more on the entire IT industry, and there are many underlying technological advances. For example, bandwidth has become wider and latency has become lower. The advantage is that what had to rely on local operations before, such as relying on local disks, can now rely on network disks. In fact, the underlying storage of Hologres is divided into multi-copy storage and high-reliability storage. All these things responsible for state management are handed over to Alibaba Cloud. The underlying layer is Pangu storage engine, which comes with multiple copies, compression, cache, and high reliability. This will make the logic of the entire computing node thinner and simpler, and also make high reliability simpler. After any node goes down, the state can be quickly restored from a distributed network disk. This will make the computing layer stateless. This is the first point. The second point is the use of disks. In the past, the rotation speed of disks had a mechanical bottleneck. Mechanical disks rotate in circles, how many revolutions per second. Therefore, our IO scenarios are all optimized for scanning scenarios. We hope that all data is updated, read and written in blocks. Therefore, in the past, such high-update scenarios were difficult to achieve in the entire data warehouse. Hologres is designed with SSD, which supports better random read and write capabilities. This allows us to design the entire storage data structure without having to rely on this scanning scenario in the past when designing the storage architecture. Hologres can store rows or columns to adapt to different scenarios, and also uses the log structured merge tree method. It supports high-throughput data writing and updating scenarios. The third is the multi-core CPU. The CPU frequency will not be substantially improved. However, in a multi-core scenario, if multiple cores in a CPU can be used in parallel, the CPU resources can be fully utilized. This requires a good grasp of the underlying language of the operating system. Hologres uses C++ to implement the data warehouse. The underlying operators of Hologres will be rewritten in a vectorized way to maximize the multi-core parallel computing capabilities and maximize the computing power. As can be seen from the figure below, we have made many improvements in networking, storage, computing, and hardware. These improvements have been fully utilized to create a system with different effects. As mentioned before, crowd selection scenarios include both pre-computation scenarios and MPP distributed computing scenarios. Using a single technology is often not suitable. When it is actually implemented, it is hoped that both pre-computation and distributed computing can be used to better integrate the two technologies. For example, dimension filtering scenarios are very suitable for BITMAP because bitmap indexes can be made on BITMAP. For example, scenarios such as true and false, purchase levels, and product attention, these scenarios that require filtering are suitable for bitmap indexes. Hologres supports bitmap indexes. The second is relational operations, which are the intersection and union of the various data sets we mentioned, and are also very suitable for bitmap computing. Because bitmap computing is equivalent to doing a lot of AND or difference operations between 0 and 1, and they are parallel operations, the efficiency is also very high. Precise deduplication is a capability that BITMAP is naturally endowed with, because when the bitmap is constructed, the ID is uniquely determined by the subscript position. By simply adding up the values of the upper one between different subscript positions, the precise deduplication value can be quickly calculated. This almost turns an O(N) problem into an O(1) scenario, and the effect is also very obvious. Therefore, pre-calculation is a very important technology in crowd selection scenarios. Hologres supports the RoaringBitmap data type, which efficiently implements the intersection and calculation of Bitmap. As mentioned above, the pre-calculation is not flexible enough, and it is necessary to use distributed computing to maximize computing power, which is where the Hologres vectorized execution engine comes in. Direct acceleration of MaxCompute data tables, including synchronizing MaxCompute data to Hologres, can improve performance by more than 10 times compared to synchronizing MaxCompute to other data sources. Typical architecture diagram The typical architecture diagram is as follows. The data source is basically through the buried data, which is delivered to Flink as soon as possible through the message middleware Kafka. Lightweight data processing is performed, including data governance correction, light data aggregation, and data dimension widening. Dimension association is a very important scenario. The real buried data records certain IDs, and these IDs must be converted into dimension information with attribute meaning. The first thing is to widen the dimension. In this case, you can use the row storage table of Hologres. When the dimension is associated, it is basically associated through the primary key. Using the row storage table of Hologres, hundreds of millions or even billions of dimension information can be stored. This information can be updated in real time. The processed result set will be written to Kafka, because it is not processed once, but may be processed for several cycles. Through the message-driven method of Kafka, several processings are performed in Flink. Basically, there are more scenarios where the processing results are double-written, one part is written to Hologres in real time, and the other part is written to MaxCompute in batches. The data correction scenario from the offline data warehouse to the real-time data warehouse is a good scenario. Data will definitely be corrected, so there will be a large number of scenarios where the real-time data warehouse is corrected through the offline data warehouse. Label processing is also a typical scenario where the offline data warehouse supplements the real-time data warehouse. Therefore, some behaviors need to be processed in the offline data warehouse and then synchronized to the real-time data warehouse. However, there are other attributes that are related to the current decision. These can be written directly to the real-time data warehouse Hologres. Therefore, the labels can be divided into offline and real-time parts. The real-time part is written to Hologres, and the offline part is processed by MaxCompute and synchronized to Hologres. There are several ways to provide data services to the outside world. The recommended way is to add a gateway when providing services to the outside world. The gateway service will do a lot of current limiting, circuit breaking, etc., which is also a good help to improve the stability of data services. If you use interactive analysis internally, you can directly connect to Hologres through JDBC. If it is an online application, it is recommended to connect to Hologres through the API gateway. Data structure layer The offline data warehouse processes two tables, one is the user basic attribute table, which records some user attributes, such as gender, city, age, etc. The other is the transaction details table, which records how many people have bought, viewed, and collected a certain product on a certain day. After these are processed by the offline data warehouse, the data is imported into Hologres. The table column description information is described in a human-readable way through configuration, and then the relevant attribute tags are configured. After the tags are put online, advertisers will configure and filter them through the interactive interface. Behind this kind of filtering, they are translated into various SQL statements, which are actually various SQL expressions. The query is actually sent down to the underlying engine. So how should the underlying engine build the table when sending it down? Wide table mode Each row describes a user's tag combination, each key is a column, and each row corresponds to a value. It is not recommended to have more than 300 columns, as too many columns will reduce the performance of real-time writing. It is divided into hot tags and non-hot tags. Hotspot tags are independent columns with clear data types, and can be indexed in a targeted manner, making them query-friendly. Non-hot tags, supported by array type and JSON, suitable for dynamic updates, but the index is not optimal and scalability is better Applicable scenarios: low number of dimension attributes; frequent real-time writing; updates in units of people Advantages: Simple development and quick launch Description of the program: User data: such as user_tags table, wide table Behavioral data: such as shop_behavior table, fact table When updating, different columns can be updated in real time or in batches Case
Narrow table mode Convert the user_tag table into a narrow table, with one row for each tag, one column for the tag name, and one column for the tag value. All data types are degenerated into string types, which are suitable for non-fixed and sparse labels. It allows sacrificing some performance but improving the flexibility of label definition. It supports different label scales from tens to hundreds of thousands. Applicable scenarios: high number of dimensional attributes; update in units of labels Advantages: Simple development and quick launch Case |
<<: Five-year action plan for new infrastructure in the transportation sector released
>>: An article to reveal the hot and cold knowledge of SRv6, the "newcomer" of the network
5G is currently the most eye-catching new technol...
The arrival of the 5G era not only brings develop...
In a complex network, loops are inevitable. In ad...
[[347353]] If you lose your phone, do you know wh...
The process of financial digitalization is accele...
[[326825]] We'll cover the different 5G speed...
When it comes to Bluetooth technology, most peopl...
In today's digital era, AI and large model ap...
On December 6, the Ministry of Industry and Infor...
Late last month, we shared the news that ZJI star...
With the orderly resumption of production and wor...
The latest report from market research firm Omida...
edgeNAT is a Chinese VPS hosting company, founded...
A few days ago, the K8s Network SIG released the ...
ZJI is the original well-known WordPress host Wei...