Tongcheng Yilong Wang Xiaobo: Cache should be managed in this way to handle high-concurrency scenarios with ease!

Tongcheng Yilong Wang Xiaobo: Cache should be managed in this way to handle high-concurrency scenarios with ease!

[51CTO.com original article] The Global Software and Operation Technology Summit hosted by 51CTO was held in Beijing on May 18-19, 2018. The summit focused on 12 core hot topics such as artificial intelligence, big data, the Internet of Things, and blockchain, and brought together 60 front-line experts from home and abroad. It was a high-end technology feast and a platform that top IT technical talents should not miss to learn and expand their network.

At the "High Concurrency and Real-time Processing" session on the afternoon of the 19th, Tongcheng-Elong Air Ticket Business Group CTO Wang Xiaobo delivered a keynote speech on "Cache Governance in High Concurrency Scenarios", elaborating on hot topics such as how to make cache more suitable for high concurrency, how to use cache correctly, and how to resolve cache problems through governance. After the meeting, 51CTO reporters sorted out the content of Wang Xiaobo's speech at the WOT2018 Global Software and Operation Technology Summit.

Wang Xiaobo mentioned in his speech that in high-concurrency scenarios, many people regard cache (high-speed cache memory) as a panacea that can "extend life". Wherever there is high concurrency pressure, cache is uploaded to solve the concurrency problem. But sometimes, even if cache is used, the system is still stuck and crashed. Is it because of poor cache technology? No, in fact, it is because the cache management work is not done well.

Take a look at the pitfalls that Tongcheng has encountered

Wang Xiaobo gave a relatively systematic introduction to the “pitfalls” that Tongcheng had encountered.

In order to relieve the pressure of high concurrency, Tongcheng initially chose memcache (distributed cache system) technology, and later switched to Redis architecture (data structure server, which can be used as database cache), and deployed nearly 200 servers. But the situation did not improve. The system often crashed, the scripts called in the application were messy, the resources of multi-instance deployment were unbalanced, and the data was too fragile to disappear.

In order to manage these servers, Tongcheng started the master-slave + keepalived (IT layer 3, 4, and 5 exchange mechanism software) mode and chose to gradually upgrade from single-machine Redis to cluster Redis. They soon found that when clusters were deployed in large numbers, the operation and maintenance side had no way to do operation and maintenance. Although they could run them uniformly through scripts, the cluster was uncontrollable, and many operation and maintenance technical means were prone to cause high-concurrency system downtime, which directly affected the overall business side. "The system could crash at any time, and the operation and maintenance team was about to collapse." Wang Xiaobo recalled.

What is the crux of all the problems encountered? Wang Xiaobo concluded that the biggest problem lies in the technical staff's specification of cache usage. People often forget its own shortcomings and only think of its advantage of "fast". He gave an example. In a system failure summary report, a technician wrote that he did not expect that Redis, which had only 30,000 lines of code in its initial state, would bring such a magical function. This idea makes programmers feel like they have a hammer in their hands and want to hammer nails when they see them. In other words, they want to use cache to solve any needs they see. Under such misleading, people began to frequently use cache-based log collectors, cache-based countdowns, cache-based counters, and cache-based order systems. After these functions appeared, people were only intoxicated by its speed, but ignored how to ensure its normal operation.

What is the real fault of cache? Wang Xiaobo summarized it into four points: First, over-reliance, which is the most prominent point. Sometimes, the cache is not needed, but the technicians insist on using it. Second, data is dropped to disk, third, the capacity is too large, and fourth, the cache avalanche. Why do these faults occur? Wang Xiaobo believes that the biggest problem is the abuse, misuse, and laziness of users. In addition, the operation and maintenance of thousands of cache servers without any usage rules, the operation and maintenance personnel do not understand development, and the developers do not understand operation and maintenance, resulting in the use of cache without design and control, and too many server resources being wasted. These are all common phenomena.

What kind of cache do people need? What kind of cache governance do they need? Wang Xiaobo believes that, from the perspective of true development philosophy, what people want is a "magic box" that can magically meet various high-concurrency requirements. In simple terms, developers do not need to care whether the cache is large or small, good or bad. Because developers have limited knowledge of cache technology, they are most afraid of using it indiscriminately. It is worth noting that many developers ignore the fact that many data in the cache are not always hot data, and do not make sufficient estimates before high concurrency arrives, which results in the discovery of bottlenecks too late during application.

Tongcheng's "Phoenix Nirvana"

In order to truly bring out the role of cache and cope with high concurrency, the Tongcheng technical team finally developed the phoenix solution. When they first designed it, they hoped that there would be a simple SDK on the application side in this architecture for developers to use. As long as the developer declares the project and related data scenarios, he will get a key. With this key, the SDK will assign a new cache warehouse to the developer, on which Redis can be run, and the entire scheduling platform can call it very quickly. In addition, phoenix can also start comprehensive monitoring from the client call. Of course, more importantly, it can prevent cache collapse and achieve dynamic expansion and contraction.

Later, the Phoenix solution added a proxy layer. Because the time cost of client multi-language development is too high, and the upgrade of the client in the application is a big problem, Wang Xiaobo revealed that the upgrade of the middleware of almost all embedded applications is a huge trouble. Once upgraded, the system needs to be retested and it is easy to crash again. Therefore, it is better to control through local cache, and disk can be used as cache for some infrequently used parts.

***, containers were added to the Phoenix solution. After container deployment was implemented, Tongcheng's overall monitoring, data migration, and scaling scheduling became more flexible and easier to operate through multiple small clusters + single nodes, cluster division by scenario, and real-time balanced scheduling data. Taking data migration as an example, Tongcheng has developed a complete migration system, from traffic expansion to data expansion, from vertical and horizontal expansion, all of which have achieved a relatively good fully automatic processing.

The above content is compiled by 51CTO reporter based on the interview with Wang Xiaobo, CTO of Tongcheng-Elong Air Ticket Business Group, at the WOT2018 Global Software and Operation Technology Summit. For more information about WOT, please visit .com.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  Is the 5G era really here? Let’s solve the dilemma of 5G spending too much and earning too little first

>>:  Unlimited traffic ≠ unlimited traffic usage. Have you ever encountered this kind of "trap"?

Blog    

Recommend

How to implement a custom serial communication protocol?

[[402368]] This article is reprinted from the WeC...

[Security Alert] Baota Panel suspected vulnerability or Nginx abnormality

For the convenience of many friends, panels are d...

A brief discussion on the prospects for the evolution of 5G core network

I recently read a paper about 5G core network, &q...

Understanding Overlay Network Technology

Introduction In the traditional historical stage,...

5G+ marks the next big shift for Asia’s industry

COVID-19 has been one of the biggest disruptors i...

Is SDN the next stop for network administrators? Why is it important?

SDN (Software Defined Networking) has become one ...

Let 5G play a role earlier and make 5G technology 4G

The popularity and application of 4G has opened t...

...

New 5G LAN technology advances QoS across the enterprise

As enterprises integrate 5G technology into their...

Hostmem: $11.99/year KVM-512MB/10GB/500GB/Los Angeles data center

Hostmem is a Chinese VPS service provider. The tr...

What is CDN? A detailed explanation of CDN in one article

[[254871]] In today's mobile Internet era, mo...

iPerf3 Tutorial: The Ultimate Tool for Easily Evaluating Network Performance

1. Introduction to iPerf3 iPerf3 is a widely used...