Image from Baotu.com If the scenic spot can accommodate 10,000 people, and now 30,000 people have entered, it is bound to be shoulder to shoulder, and accidents may happen. The result is that everyone has a bad experience. If an accident occurs, the scenic spot may have to be closed, making it unavailable to the public. The consequence of this is that everyone will feel that the experience is terrible. The idea behind limiting the flow of people is to increase the number of people entering as much as possible while ensuring availability, and the rest of the people queue up outside to ensure that the 10,000 people inside can play normally. Back to the Internet, the same principle applies. For example, if a certain celebrity announces his/her relationship, the number of visits will increase from the usual 500,000 to 5 million. The system can support a maximum of 2 million visits, the Double Eleven flash sales, and the 12306 ticket rush. Then we need to implement the current limiting rules to ensure that it is in an available state, so that the server does not crash and cause all requests to be unavailable. Current Limitation IdeaThere are generally several modes for limiting the flow of system services: ① FuseThe system takes the fuse measures into consideration at the beginning of its design. When a problem occurs in the system and it cannot be repaired in a short period of time, the system will automatically make a judgment, open the fuse switch, and deny traffic access to avoid overloading the backend with large traffic requests. The system should also be able to dynamically monitor the repair status of the backend program. When the program has returned to stability, the fuse switch can be turned off and normal service can be restored. Common circuit breaker components include Hystrix and Alibaba's Sentinel. Both have their own advantages and disadvantages, and you can choose one based on the actual business situation. ②Service degradationAll functional services of the system are graded. When a problem occurs in the system and emergency flow control is required, less important functions can be downgraded and services can be stopped. This will free up more resources for the core functions. For example, in an e-commerce platform, if there is a sudden surge in traffic, non-core functions such as product reviews and points can be temporarily downgraded, and these services can be stopped to release resources such as machines and CPUs to ensure that users can place orders normally. These downgraded functional services can be restarted after the entire system returns to normal to process orders/compensation. In addition to functional downgrade, you can also use the method of not directly operating the database and using read cache and write cache as a temporary downgrade solution. ③Delayed processingThis mode requires setting up a traffic buffer pool at the front end of the system to buffer all requests into this pool and not process them immediately. Then the real business processing program at the back end takes the requests from this pool and processes them one by one. The common way to achieve this is using the queue mode. This is equivalent to using an asynchronous method to reduce the processing pressure on the backend. However, when the traffic is large, the backend's processing capacity is limited, and the requests in the buffer pool may not be processed in time, resulting in a certain degree of delay. The specific leaky bucket algorithm and token bucket algorithm are based on this idea. ④Privilege processingThis mode requires users to be classified. Through preset classification, the system gives priority to user groups that require high security. Requests from other user groups will be delayed or not processed directly. The differences between caching, downgrading, and current limiting are as follows:
Current limiting refers to scenarios where caching and downgrade are ineffective. For example, when the threshold is reached, the frequency of interface calls, the number of visits, the number of inventory items, etc. are limited, and the service is downgraded in advance before the service becomes unavailable. Only a portion of users are served well. Current limiting algorithmThere are many current limiting algorithms, and the three most common ones are counter algorithm, leaky bucket algorithm, and token bucket algorithm. They are explained one by one below. ①Counter algorithmSimple and crude, such as specifying the thread pool size, specifying the database connection pool size, the number of Nnginx connections, etc., all belong to the counter algorithm. The counter algorithm is the simplest and easiest to implement among the current limiting algorithms. For example, we stipulate that for interface A, the number of visits in 1 minute cannot exceed 100. Then we can do this: at the beginning, we can set a counter, and every time a request comes, the counter increases by 1. If the counter value is greater than 100 and the interval between this request and the first request is still within 1 minute, it means that the number of requests is too large and access is denied. If the interval between this request and the first request is greater than 1 minute, and the counter value is still within the current limit range, then reset the counter. It is that simple and crude. ②Leaky Bucket AlgorithmThe idea of the leaky bucket algorithm is very simple. Water (request) first enters the leaky bucket, and the bucket discharges water at a certain speed. When the water inflow speed is too large and exceeds the capacity of the bucket, it overflows directly. It can be seen that the leaky bucket algorithm can forcibly limit the data transmission rate.
③Token Bucket AlgorithmThe token bucket is similar to the leaky bucket, but the difference is that some tokens are placed in the token bucket. When a service request arrives, the service can only be obtained after obtaining the token. For example, when we go to the cafeteria to eat, we usually line up in front of the window inside the cafeteria. This is like the leaky bucket algorithm, where a large number of people gather outside the window inside the cafeteria and enjoy service at a certain speed. If there are too many people pouring in and the canteen can't accommodate them, some people may stand outside the canteen and not enjoy the canteen's services. This is called overflow. Overflow can continue to request, that is, continue to queue, so what's the problem with this? If there are special circumstances at this time, such as some volunteers who are in a hurry or senior high school students are about to take the college entrance examination, this situation is an emergency. If the leaky bucket algorithm is also used, data will have to be queued slowly, which does not meet our needs. For many application scenarios, in addition to requiring the ability to limit the average data transmission rate, it is also required to allow a certain degree of burst transmission. At this time, the leaky bucket algorithm may not be suitable, and the token bucket algorithm is more suitable. As shown in the figure, the principle of the token bucket algorithm is that the system will put tokens into the bucket at a constant rate. If a request needs to be processed, it is necessary to obtain a token from the bucket first. When there is no token available in the bucket, the service is denied. Concurrency LimitationSimply put, it is to set the total QPS number of the system threshold. These are also quite common. Take Tomcat as an example, many parameters are based on this consideration. For example, the configured acceptCount sets the number of response connections, maxConnections sets the instantaneous maximum number of connections, and maxThreads sets the maximum number of threads. In each framework or component, concurrency limiting is reflected in the following aspects:
With concurrent flow limiting, there is an additional protection mechanism when dealing with high concurrency. There is no need to worry about instantaneous traffic causing the system to crash or avalanche, and ultimately the service is compromised rather than not provided. However, current limiting needs to be carefully evaluated and cannot be used indiscriminately, otherwise some strange problems may occur with normal traffic, resulting in a poor user experience and user loss. Interface current limitingInterface current limiting is divided into two parts: one is to limit the number of interface calls within a period of time, referring to the counter algorithm of the previous current limiting algorithm; the other is to set a sliding time window algorithm. ①Total number of interfacesTo control the total number of interface calls within a period of time, you can refer to the previous counter algorithm and will not go into details. ②Interface time windowThe problem with the fixed time window algorithm (that is, the counter algorithm mentioned above) is that the statistical interval is too large, the current limiting is not accurate enough, and the relationship and impact with the previous statistical interval are not considered in the second statistical interval (the second half of the first interval + the first half of the second interval are also one minute). In order to solve the critical problem mentioned above, we try to divide each statistical interval into smaller statistical intervals for more accurate statistical counts. In the above example, assuming that the QPS can accept 100 queries/second, the access is very low in the first 40 seconds of the previous minute, and then it suddenly increases in the next 20 seconds, and this continues for a while until the 40th second of the second minute when it starts to drop. According to the previous counting method, the QPS of the previous second is 94, and the QPS of the next second is 92, which does not exceed the set parameters. But in the middle area, the QPS reaches 142, which obviously exceeds the number of service requests allowed by us, so the fixed window counter is not reliable and a sliding window counter is needed. The counter algorithm is actually a fixed window algorithm, but it does not further divide the time window, so there is only 1 grid. It can be seen that the more grids the sliding window is divided into, that is, the seconds are accurate to milliseconds or nanoseconds, the smoother the sliding window scrolls and the more accurate the current limiting statistics will be. It should be noted that more space is consumed. Current limiting implementationThis part is the specific implementation of current limiting. Let me briefly talk about it. After all, no one wants to read long code. ① Guava implementationImport package:
Core code:
②Token Bucket ImplementationStable mode (SmoothBursty: constant token generation speed):
RateLimiter.create(2) capacity and burst. The token bucket algorithm allows tokens that have not been consumed for a period of time to be temporarily stored in the token bucket for burst consumption. Progressive mode (SmoothWarmingUp: the token generation speed slowly increases until it reaches a stable value):
time out:
Whether the token can be obtained within the timeout period, executed asynchronously. ③Distributed system current limiting (implemented by Nginx+Lua)You can use resty.lock to maintain atomicity, so that the lock will not be re-entrant between requests: https://github.com/openresty/lua-resty-lock Use lua_shared_dict to store data:
Author: The harmonica that can't wait Editor: Tao Jialong Source: https://urlify.cn/fuAB7j |
<<: GTI releases 2.3GHz spectrum industry joint statement
>>: Introducing a request library - How to use several Undici APIs
The national "number portability" servi...
Recently, the "Conference on Deepening the I...
BuyVM recently restocked the Luxembourg data cent...
Psychz is a Chinese-owned data center that was fo...
With the trend of digital transformation, enterpr...
ZJI was founded in 2011 as Weixiang Host, a well-...
By 2018, 5G has gone from theory to reality and w...
For years, it seemed like the hype about 5G would...
HostSlick has launched a Christmas/New Year's...
It's the start of the new school year again, ...
[51CTO.com original article] On September 15, the...
RepriseHosting is a long-established hosting comp...
On February 20, South Korea announced the officia...
The advent of 5G brings not only new technologies...
【51CTO.com original article】Just last week, the W...