[51CTO.com original article] The Global Software and Operation Technology Summit hosted by 51CTO was held in Beijing on May 18-19, 2018. The summit focused on 12 core hot topics such as artificial intelligence, big data, the Internet of Things, and blockchain, and brought together 60 front-line experts from home and abroad. It was a high-end technology feast and a platform that top IT technical talents should not miss to learn and expand their network. At the "High Concurrency and Real-time Processing" session on the afternoon of the 19th, Liu Jianhui, a big data architect at 51 Credit Card, delivered a keynote speech titled "The Way to Advance Big Data Application Products", elaborating on topics of public concern such as conventional big data architecture, big data user needs, and data product construction. After the meeting, 51CTO reporters compiled Liu Jianhui's speech at the WOT2018 Global Software and Operation Technology Summit.
Common big data architecture Liu Jianhui pointed out that in fact, every company's big data architecture is similar, which is basically divided into five levels: collection, storage, scheduling, calculation, and data display. If you want to use big data for your own benefit, you must first understand what big data is. Some people think that big data is involved in the development of the underlying platform, while others think that it is a tool for business development. Liu Jianhui once thought that big data is just writing a few lines of architecture code. It seems that there is no essential difference between writing code based on Spark (a fast and general computing engine designed for large-scale data processing) or Flink (a distributed processing engine for streaming and batch data). Facts have proved that these views are actually not comprehensive. If you want to understand what big data is, you can define it through big data "discovering problems" and "solving problems". In a company, there are three types of people who are closely related to big data: data analysts (data warehouse, BI), algorithm strategy personnel, operations personnel, growth team, product developers, and designers. Especially for enterprise product developers in the process of digital transformation, they may lack experience in using big data and mostly design products based on traditional models. Big data is of great help to them. It can completely help product personnel make decisions from the perspective of data, which is more effective. So what are people's needs for big data analysis? Liu Jianhui summarized it into three points. The first is instant query. If a technician wants to know a certain result, he can write it into MySQL (relational database management system) and see the result immediately, which is immediately satisfied. The second is task scheduling. The technician hopes to produce a data report at a fixed time node every day. The third is report output, which is fast and reliable. After understanding the actual needs of big data, how do you choose the big data products and solutions that suit you? Liu Jianhui gave three suggestions: First, remember that an executable plan is better than tomorrow's best plan; second, the product must always meet the needs of users' business scenarios; and finally, ensure that the product is easy to use, stable, and reliable. It turns out that big data algorithms have so many tricks! At the speech, Liu Jianhui also shared the problems encountered by 51 Credit Card in big data practice, focusing on the algorithm strategy of big data. He said that 51 Credit Card Company lacks a unified model training and model publishing platform. In this regard, large companies are doing better. Due to their long-term accumulation, the coordination between algorithms and engineering is relatively perfect. For small and medium-sized companies, each technician uses a variety of algorithm models, and the evaluation indicators are also "a hundred flowers blooming". This approach actually has great disadvantages. For any company, the evaluation indicators of its business should be unified, and a unified model training and model publishing platform should be established. The inconsistency between online and offline variables is also a problem. Currently, most model training is done offline. After the calculation is completed, the offline variables are converted into online variables, and the online model is called in real time online. In this process, some technicians will overlook the fact that the algorithm is also effective. A very good algorithm may become invalid after a month. The reason lies in the change of business. The core is that after the front-end revision, the target population of the product has changed. All the online variables were completely fine a month ago, but they became invalid the next month. This phenomenon will cause terrible losses to financial companies. Another issue is the monitoring and alarm problem after the model is launched. The technical team hopes to know the full situation of model variables and stability in advance, so that they can evaluate the support of the algorithm model for existing businesses in advance, rather than waiting two months later to find that losses have occurred and then realize that the model variables have failed. Liu Jianhui also gave a relatively reasonable algorithm development process, which includes five steps: from feature mining to model training, to real-time variable development, model launch, and finally model monitoring and evaluation. Throughout the process, 51 Credit Card encountered various challenges. As a "veteran", he shared his experience: First of all, the data sources that online variables and offline variables rely on must be consistent. Some algorithm developers said that they had discovered some new features that could increase the success rate of anti-fraud by several percentage points, but the actual online effect was not good. The reason is that the data sources that offline variables rely on are completely different from those that online variables rely on. Second, online variables should also use SQL as much as possible to avoid logical errors. Liu Jianhui said that when the amount of data is not particularly large, the appropriate cost expenditure is more worthwhile than the possible problems and losses, so he suggested that everyone use SQL to solve the problem in the same way as online and offline. Third, monitoring supports obtaining custom indicators from Hive data sources. During business operations, the operations team will promote operations and revise products. Eventually, problems with business indicators are not necessarily caused by algorithms. However, the problem that the algorithm team needs to solve is to know what changes have occurred in the business in real time, make the monitoring indicators more complete, and conduct analysis as soon as possible. Fourth, the model evaluation function and monitoring indicators should be unified. How to use big data to guide business? At the end of his speech, Liu Jianhui also focused on how big data can help product operations. For most companies, the operation process is similar, consisting of the steps of customer acquisition, registration, conversion, revenue, and dissemination. So how to improve the conversion rate of the channel? There are two solutions. One is to increase the core functions through product design. The second is to carry out appropriate operation activities, such as giving red envelopes to new users when they register. When the conversion rate decreases, people can also analyze big data and develop more effective operation strategies. First, draw a user map of the entire product to see if the new version meets user needs; then use the buried data to summarize the funnel conversion rate of the product path, and analyze the conversion rate to find the source of the problem, whether it is a product design problem that causes user loss, or a bug in the front end that prevents customer information from being saved, or a problem with the H5 page. When the analysis results come out, you can carry out more targeted promotion activities for precise groups of people. The above content is compiled by 51CTO reporter based on the interview with 51 Credit Card Big Data Architect Liu Jianhui at the WOT2018 Global Software and Operation Technology Summit. For more information about WOT, please visit .com. [51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites] |
<<: Talking about IS-IS: The favorite concubine of routing protocol
Recently, China Mobile and Industrial and Commerc...
CloudCone is a foreign VPS hosting company founde...
After more than two years of development, 5G has ...
Less than half of UK 5G users say 5G offers impro...
October 10, 2018 Shanghai - Huawei and Standard C...
Cloudie is a Hong Kong IDC Internet service techn...
Labs Guide In recent years, IoT devices have been...
[Original article from 51CTO.com] On November 25-...
Common high-risk ports (1) TCP port 21: The defau...
Friends who need independent servers in Asia such...
On June 14, Cisco's annual networking and sec...
[51CTO.com original article] On the afternoon of ...
[[334143]] This article is reproduced from Leipho...
There are still a few days before Black Friday, a...
1. Introduction In recent years, the "Intern...