This article is reprinted from the WeChat public account "Coffee Latte", the author is Coffee Latte. Please contact the Coffee Latte public account to reprint this article. background With the popularity of microservices, the stability between services has become more and more important. We often spend a lot of time on maintaining the stability of services. Current limiting and circuit breaking and downgrading are the two most commonly used methods. Some time ago, some friends in the group had some questions about the use of current limiting. In addition, the company has also done things related to current limiting during the recent promotion, so I would like to summarize and write some of my views on current limiting here. I just said that flow limiting is one of the means we use to ensure service stability, but it does not guarantee stability in all scenarios. Just like its name, it can only play its role in scenarios with large or burst traffic. For example, our system supports up to 100QPS, but suddenly a 1000QPS request comes in. The system may hang directly at this time, resulting in the failure to process any subsequent requests. However, if we have flow limiting means, no matter how large the QPS is, we only process 100QPS requests and directly reject other requests. Although we reject the 900QPS request, our system does not hang, and our system can still continuously process subsequent requests, which is what we expect. Some students may say that since we are now on the cloud, dynamic scaling of services should be very simple. If we find that the traffic is very large, we can automatically expand the machine to support the target QPS, so there is no need to limit the flow. In fact, there should be quite a lot of students who have this idea. Some students may have been fooled by some boastful articles, so they think so. This idea can be realized when it is particularly ideal, but in reality there are actually the following problems:
Therefore, simply expanding capacity cannot solve this problem. Current limiting is still a skill we must master! Rationale If you want to master current limiting, you need to master some of its basic algorithms first. There are basically three types of current limiting algorithms: counter, funnel, and token bucket. The others are evolved on these bases. Counter algorithm First, let's talk about the counter algorithm. This algorithm is relatively simple and crude. We only need an accumulated variable, and then refresh this accumulated variable every second, and then determine whether this accumulated variable is greater than our maximum QPS.
This code is relatively simple. We define the current qps, the time of the last refresh of the accumulated variable, our maximum qps and our lock. Every time we check, we need to determine whether a refresh is needed. If a refresh is needed, we need to reset both the time and qps, and then perform the qps accumulation judgment. This algorithm is too simple so the problems it brings are particularly obvious. If our maximum qps is 100, 100 requests come in 0.99 seconds, and then another 100 requests come in 1.01 seconds, this can pass our program, but we actually passed 200 requests within 0.03 seconds, which is definitely not in line with our expectations, because it is very likely that these 200 requests will directly crash our machine. Sliding Window Counter In order to solve the critical problem above, we can use a sliding window to solve this problem: As shown in the figure above, we divide the 1s ordinary counter into five 200ms periods. The current QPS we count needs to count all the QPS in the five most recent windows. Back to the previous question, 0.99 seconds and 1.01 seconds are actually within our five most recent windows, so the critical spike problem mentioned above will not occur here. In fact, if you think about it from another angle, our ordinary counter is actually a sliding window counter with a window number of 1. The more windows we divide, the more accurate the statistics will be when we use the counter solution, but the cost of maintaining the window will increase relatively. When we introduce Sentinel later, we will explain in detail how it implements sliding window counting. Funnel algorithm Solving the critical spike problem in the counter can also be achieved through the funnel algorithm, as shown in the following figure: In the funnel algorithm, we need to pay attention to the leaky bucket and uniform outflow. No matter how large the traffic is, it will first go into the leaky bucket and then flow out at a uniform speed. How to achieve this uniform speed in the code? For example, if we want the uniform speed to be 100q/s, then we can get that it takes 10ms for each outflow of traffic, which is similar to a queue. Every 10ms, the traffic is taken from the head of the queue for release. Our queue is the leaky bucket. When the traffic is greater than the length of the queue, we can reject the excess part. The funnel algorithm also has certain disadvantages: it cannot handle burst traffic (different from the critical spike above, don’t confuse them). For example, if 100 requests come in at once, they can only be processed one by one in the leaky bucket algorithm. When the last request comes out, one second has passed. Therefore, the funnel algorithm is more suitable for scenarios where requests arrive evenly and the request rate needs to be strictly controlled. Token Bucket Algorithm In order to solve the burst traffic situation, we can use the token bucket algorithm, as shown in the following figure: There are three stages to focus on in this diagram:
Single machine current limiting We have introduced some basic current limiting algorithms above. When we apply these algorithms to our distributed services, we can divide them into two types: single-machine current limiting and cluster current limiting. Single-machine current limiting means that each machine does its own current limiting without affecting each other. Next, let's see how to implement single-machine current limiting. guava Guava is Google's open source Java core tool library, which includes collections, caches, concurrency and other useful tools. Of course, it also provides the current limiting tools we need here. The core class is RateLimiter.
The usage is relatively simple. As shown in the code above, we only need to build a RateLimiter and then call the tryAcquire method. If the return value is true, it means that the traffic is passing through. Otherwise, the traffic is limited. In guava, RateLimiter is divided into two types. One is the implementation of the ordinary token bucket algorithm, and the other is the RateLimiter with preheating, which allows us to gradually increase the release speed of the token bucket until it reaches the maximum. This preheating is also available in sentinel. This can be used in some cold systems, such as when the database connection pool is not fully filled and is still being initialized. Here I will just briefly introduce how to implement Guava's token bucket: The ordinary token bucket creates a SmoothBursty class, which is the key to our current limiting. How to do current limiting in our tryAcquire: There are four steps here:
The method for deducting tokens is specifically in the reserveEarliestAvailable method: Although it seems that there are many processes here, if we just call tryAcquire(), we only need to pay attention to the two red boxes:
Guava's current limiting currently provides these two methods of current limiting. Many middleware or business services use Guava's current limiting as their own tool, but Guava's method is relatively limited. It does not support dynamic changes in current limiting and more policy-based current limiting, so we will introduce sentinel next. sentinel Sentinel is a lightweight traffic control framework of Alibaba's open source distributed service framework. It has taken over the core scenarios of Alibaba's Double 11 promotion traffic in the past 10 years. Its core is traffic control but it is not limited to traffic control. It also supports circuit breaking, downgrade, monitoring, etc. The current limiting of sentinel is slightly more complicated than that of guava. Here is the simplest code:
Although the use of sentinel is much more complicated than guava, the algorithm options are slightly more than guava's current limiting. Based on the number of concurrent threads What we introduced before are all based on QPS. Sentinel provides a strategy based on the number of concurrency, which has an effect similar to semaphore isolation. When we need to prevent the business thread pool from being exhausted by slow calls, we can use this mode. Generally speaking, the http interface provided by the same service uses a thread pool. For example, if we use the tomcat-web server, we will have a tomcat business thread pool. If there are two methods A and B in http, B is relatively fast and A is relatively slow. If A is called a large number of times, the thread cannot be released because A is too slow, which may cause the thread pool to be exhausted and the other method B cannot get a thread. We have encountered this scenario before, which directly led to all requests received by the entire service being rejected. Some students said that limiting the QPS of A is enough. It should be noted that QPS is per second. If the time consumption of our A interface is greater than 1s, then the QPS needs to be recalculated after the next wave of A comes. Based on this, current limiting based on the number of concurrent connections is provided. We set the Grade to FLOW_GRADE_THREAD to implement this current limiting mode. Based on QPS The QPS-based current limiting sentinel also provides four strategies:
Based on call relationship Sentinel provides a more complex current limiting method, which can be more flexible based on the calling relationship:
Associated traffic control: This is also supported in Sentinel. Two associated resources can affect each other's traffic control. For example, if two interfaces use the same resource, one interface is more important and the other is not so important, we can set a rule that when the important interface is accessed in large quantities, the other unimportant interface can be restricted to prevent the sudden traffic on this interface from affecting the important interface. Some issues with sentinel Although Sentinel provides so many algorithms, it also has some problems:
These problems are basically compared with Guava's current limiting. After all, Sentinel has more functions and the cost is relatively higher. Cluster current limiting All the current limiting mentioned before is single-machine current limiting, but we are now using the microservice cluster architecture model. Usually a service will have multiple machines. For example, there is an order service, and this service has 10 machines. Then we want to limit the entire cluster to 500QPS. How should we do it? This is very simple. Just limit the current of each machine to 50. 50*10 is 500. However, in the real environment, there will be load imbalance. When microservices are called, there are various load balancing algorithms, such as same-computer room priority, round-robin, random and other algorithms. These algorithms may cause our load to be not particularly balanced, which will cause the QPS of our entire cluster to be less than 500, or even be limited at 400. This is what we have encountered in real scenarios. Since there is a problem with single-machine current limiting, we should design a more complete cluster current limiting solution. Redis This solution does not rely on the current limiting framework. Our entire cluster can use the same redis. We need to encapsulate the current limiting logic ourselves. Here we use the simplest counter to design it. We use our system time in seconds as the key and set it to redis (a certain expiration time can be set for space cleaning). Using redis's int atomic addition, each request is added by 1, and then it is determined whether the current value exceeds the maximum value of our current limit. The redis solution is generally simple to implement, but it is highly dependent on our system time. If the system time between different machines deviates, the current limiting may be inaccurate. sentinel Sentinel provides a cluster solution, which is quite unique compared to other current limiting frameworks. Sentinel provides two modes:
Of course, Sentinel also has some fallback strategies. If the token-server fails, we can regress to our single-machine current limiting mode, which will not affect our normal services. Actual Combat We have introduced many tools for limiting traffic, but many students are still confused about how to limit traffic. If we limit traffic for a scene or a resource, there are several points that need to be confirmed:
Where to limit current? This is a complicated question. Many companies and teams have different approaches. When I was at Meituan, we launched a wave of SOA. I remember that at that time, all services and all interfaces needed to be throttled. Each team was asked to evaluate a reasonable QPS upper limit for the interface. In theory, this is correct. We should give an upper limit to each interface to prevent the overall system from being dragged down. However, the cost of doing so is very high, so most companies still choose to do throttling. First, we need to determine some core interfaces, such as ordering and payment in the e-commerce system. If the traffic is too large, there will be problems with the paths in the e-commerce system. For example, for edge interfaces such as reconciliation (which do not affect the core path), we can choose not to set flow limits. Secondly, we don’t necessarily limit the flow only at the interface layer. In many cases, we limit the flow directly at the gateway layer to prevent the flow from further penetrating into the core system. Of course, the front end can also limit the flow. When the front end captures the error code of the flow limit, the front end can prompt to wait for information, which is actually part of the flow limit. In fact, the more the flow limit is triggered downstream, the greater the waste of our resources, because the upstream has done a lot of work before the downstream flow limit. If the flow limit is triggered at this time, then the previous work will be in vain. If some rollback work is involved, it will increase our burden. Therefore, for the flow limit, the higher the trigger, the better. How much flow is limited Most of the time, the question of how much traffic to limit is probably a historical experience value. We can use the daily QPS monitoring chart, and then add a little redundant QPS to this contact, which may be our current limit. However, there is a scenario that needs attention, that is, during a big promotion (here refers to the scenario in the e-commerce system, and other systems are similar to the scenario with higher traffic), the traffic of our system will increase suddenly, and it will no longer be our daily QPS. In this case, we often need to perform full-link stress testing on our system before the big promotion to get a reasonable upper limit, and then set the current limit based on this upper limit. How to choose tools Generally speaking, larger Internet companies have their own unified current limiting tools, so you can just use them here. For other situations, if there is no need for cluster current limiting or circuit breaking, I personally think that RateLimter is a good choice, as it is relatively simple to use and has basically no learning cost. If there are other needs, I personally think that Sentinel should be chosen. As for Hytrix, I personally do not recommend it because it is no longer maintained. Summarize Although there are only two words "current limiting", it is not easy to truly understand and do a good job of current limiting. For me personally, this article is just some superficial insights. If you have any better opinions, you can follow my official account and leave a message for discussion. |
>>: Is there still a market for pure 4G mobile phones?
VMISS (Virtual Machines Innovative Solutions) was...
The topic we are going to talk about today is rel...
What has changed since HTTP/1.1 was invented? HTT...
Author | Lei Ge Source | Java interview questions...
[[411113]] According to the latest report "C...
We often share the VPS hosting information provid...
Continuing from the previous article "Learn ...
Shanghai, China, June 9, 2020 – CommScope recentl...
Are 5G and 5 GHz Wi-Fi the same thing? No, but te...
During the operation and maintenance of a large p...
1. Related concepts 1. Noise Floor Definition: It...
Everyone must have had this experience: the wirel...
【51CTO.com Quick Translation】 Big data, as a set ...
1. Introduction to Ad Anti-Cheat 1.1 Definition o...