Our company’s “Double 11” flow control plan, come and copy our homework!

Our company’s “Double 11” flow control plan, come and copy our homework!

[[430197]]

Image from Baotu.com

If the scenic spot can accommodate 10,000 people, and now 30,000 people have entered, it is bound to be shoulder to shoulder, and accidents may happen. The result is that everyone has a bad experience.

If an accident occurs, the scenic spot may have to be closed, making it unavailable to the public. The consequence of this is that everyone will feel that the experience is terrible.

The idea behind limiting the flow of people is to increase the number of people entering as much as possible while ensuring availability, and the rest of the people queue up outside to ensure that the 10,000 people inside can play normally.

Back to the Internet, the same principle applies. For example, if a certain celebrity announces his/her relationship, the number of visits will increase from the usual 500,000 to 5 million. The system can support a maximum of 2 million visits, the Double Eleven flash sales, and the 12306 ticket rush.

Then we need to implement the current limiting rules to ensure that it is in an available state, so that the server does not crash and cause all requests to be unavailable.

Current Limitation Idea

There are generally several modes for limiting the flow of system services:

① Fuse

The system takes the fuse measures into consideration at the beginning of its design. When a problem occurs in the system and it cannot be repaired in a short period of time, the system will automatically make a judgment, open the fuse switch, and deny traffic access to avoid overloading the backend with large traffic requests.

The system should also be able to dynamically monitor the repair status of the backend program. When the program has returned to stability, the fuse switch can be turned off and normal service can be restored.

Common circuit breaker components include Hystrix and Alibaba's Sentinel. Both have their own advantages and disadvantages, and you can choose one based on the actual business situation.

②Service degradation

All functional services of the system are graded. When a problem occurs in the system and emergency flow control is required, less important functions can be downgraded and services can be stopped. This will free up more resources for the core functions.

For example, in an e-commerce platform, if there is a sudden surge in traffic, non-core functions such as product reviews and points can be temporarily downgraded, and these services can be stopped to release resources such as machines and CPUs to ensure that users can place orders normally.

These downgraded functional services can be restarted after the entire system returns to normal to process orders/compensation.

In addition to functional downgrade, you can also use the method of not directly operating the database and using read cache and write cache as a temporary downgrade solution.

③Delayed processing

This mode requires setting up a traffic buffer pool at the front end of the system to buffer all requests into this pool and not process them immediately.

Then the real business processing program at the back end takes the requests from this pool and processes them one by one. The common way to achieve this is using the queue mode.

This is equivalent to using an asynchronous method to reduce the processing pressure on the backend. However, when the traffic is large, the backend's processing capacity is limited, and the requests in the buffer pool may not be processed in time, resulting in a certain degree of delay. The specific leaky bucket algorithm and token bucket algorithm are based on this idea.

④Privilege processing

This mode requires users to be classified. Through preset classification, the system gives priority to user groups that require high security. Requests from other user groups will be delayed or not processed directly.

The differences between caching, downgrading, and current limiting are as follows:

  • Cache is used to increase system throughput, improve access speed and provide high concurrency.
  • Downgrade means that when some service components of the system are unavailable, traffic surges, resources are exhausted, etc., the problematic services are temporarily blocked and downgraded services are continued. Users are given as friendly prompts as possible and the backup data is returned. The overall business process will not be affected and the service will be put back online after the problem is resolved.

Current limiting refers to scenarios where caching and downgrade are ineffective. For example, when the threshold is reached, the frequency of interface calls, the number of visits, the number of inventory items, etc. are limited, and the service is downgraded in advance before the service becomes unavailable. Only a portion of users are served well.

Current limiting algorithm

There are many current limiting algorithms, and the three most common ones are counter algorithm, leaky bucket algorithm, and token bucket algorithm. They are explained one by one below.

①Counter algorithm

Simple and crude, such as specifying the thread pool size, specifying the database connection pool size, the number of Nnginx connections, etc., all belong to the counter algorithm.

The counter algorithm is the simplest and easiest to implement among the current limiting algorithms. For example, we stipulate that for interface A, the number of visits in 1 minute cannot exceed 100.

Then we can do this: at the beginning, we can set a counter, and every time a request comes, the counter increases by 1.

If the counter value is greater than 100 and the interval between this request and the first request is still within 1 minute, it means that the number of requests is too large and access is denied.

If the interval between this request and the first request is greater than 1 minute, and the counter value is still within the current limit range, then reset the counter. It is that simple and crude.

②Leaky Bucket Algorithm

The idea of ​​the leaky bucket algorithm is very simple. Water (request) first enters the leaky bucket, and the bucket discharges water at a certain speed. When the water inflow speed is too large and exceeds the capacity of the bucket, it overflows directly. It can be seen that the leaky bucket algorithm can forcibly limit the data transmission rate.

  • Peak shaving: When a large amount of traffic enters, overflow will occur, so current limiting is used to protect service availability.
  • Buffering: No direct requests to the server are required, which reduces buffering pressure. The consumption speed is fixed because the computing performance is fixed.

③Token Bucket Algorithm

The token bucket is similar to the leaky bucket, but the difference is that some tokens are placed in the token bucket. When a service request arrives, the service can only be obtained after obtaining the token.

For example, when we go to the cafeteria to eat, we usually line up in front of the window inside the cafeteria. This is like the leaky bucket algorithm, where a large number of people gather outside the window inside the cafeteria and enjoy service at a certain speed.

If there are too many people pouring in and the canteen can't accommodate them, some people may stand outside the canteen and not enjoy the canteen's services. This is called overflow. Overflow can continue to request, that is, continue to queue, so what's the problem with this?

If there are special circumstances at this time, such as some volunteers who are in a hurry or senior high school students are about to take the college entrance examination, this situation is an emergency.

If the leaky bucket algorithm is also used, data will have to be queued slowly, which does not meet our needs. For many application scenarios, in addition to requiring the ability to limit the average data transmission rate, it is also required to allow a certain degree of burst transmission.

At this time, the leaky bucket algorithm may not be suitable, and the token bucket algorithm is more suitable.

As shown in the figure, the principle of the token bucket algorithm is that the system will put tokens into the bucket at a constant rate. If a request needs to be processed, it is necessary to obtain a token from the bucket first. When there is no token available in the bucket, the service is denied.

Concurrency Limitation

Simply put, it is to set the total QPS number of the system threshold. These are also quite common. Take Tomcat as an example, many parameters are based on this consideration.

For example, the configured acceptCount sets the number of response connections, maxConnections sets the instantaneous maximum number of connections, and maxThreads sets the maximum number of threads.

In each framework or component, concurrency limiting is reflected in the following aspects:

  • Limit the total number of concurrent connections (such as database connection pools, thread pools)
  • Limit the number of instantaneous concurrent connections (limit_conn module of nginx is used to limit the number of instantaneous concurrent connections)
  • Limit the average rate within the time window (such as Guava's RateLimiter and nginx's limit_req module, which limit the average rate per second)
  • Others include limiting the remote interface call rate and limiting the MQ consumption rate.
  • In addition, you can also limit the flow based on the number of network connections, network traffic, CPU or memory load, etc.

With concurrent flow limiting, there is an additional protection mechanism when dealing with high concurrency. There is no need to worry about instantaneous traffic causing the system to crash or avalanche, and ultimately the service is compromised rather than not provided.

However, current limiting needs to be carefully evaluated and cannot be used indiscriminately, otherwise some strange problems may occur with normal traffic, resulting in a poor user experience and user loss.

Interface current limiting

Interface current limiting is divided into two parts: one is to limit the number of interface calls within a period of time, referring to the counter algorithm of the previous current limiting algorithm; the other is to set a sliding time window algorithm.

①Total number of interfaces

To control the total number of interface calls within a period of time, you can refer to the previous counter algorithm and will not go into details.

②Interface time window

The problem with the fixed time window algorithm (that is, the counter algorithm mentioned above) is that the statistical interval is too large, the current limiting is not accurate enough, and the relationship and impact with the previous statistical interval are not considered in the second statistical interval (the second half of the first interval + the first half of the second interval are also one minute).

In order to solve the critical problem mentioned above, we try to divide each statistical interval into smaller statistical intervals for more accurate statistical counts.

In the above example, assuming that the QPS can accept 100 queries/second, the access is very low in the first 40 seconds of the previous minute, and then it suddenly increases in the next 20 seconds, and this continues for a while until the 40th second of the second minute when it starts to drop.

According to the previous counting method, the QPS of the previous second is 94, and the QPS of the next second is 92, which does not exceed the set parameters.

But in the middle area, the QPS reaches 142, which obviously exceeds the number of service requests allowed by us, so the fixed window counter is not reliable and a sliding window counter is needed.

The counter algorithm is actually a fixed window algorithm, but it does not further divide the time window, so there is only 1 grid.

It can be seen that the more grids the sliding window is divided into, that is, the seconds are accurate to milliseconds or nanoseconds, the smoother the sliding window scrolls and the more accurate the current limiting statistics will be. It should be noted that more space is consumed.

Current limiting implementation

This part is the specific implementation of current limiting. Let me briefly talk about it. After all, no one wants to read long code.

① Guava implementation

Import package:

  1. <! -- https://mvnrepository.com/artifact/com.google.guava/guava -->  
  2. <dependency>
  3. <groupId>com.google.guava</groupId>
  4. <artifactId>guava</artifactId>
  5. <version>28.1-jre</version>
  6. </dependency>

Core code:

  1. LoadingCache<Long, AtomicLong> counter = CacheBuilder.newBuilder().
  2. expireAfterWrite(2, TimeUnit.SECONDS)
  3. .build(new CacheLoader<Long, AtomicLong>() {
  4.  
  5. @Override
  6. public AtomicLong load (Long secend) throws Exception {
  7. // TODO Auto-generated method stub
  8. return new AtomicLong(0);
  9. }
  10. });
  11. counter.get(1l).incrementAndGet();

②Token Bucket Implementation

Stable mode (SmoothBursty: constant token generation speed):

  1. public   static void main(String[] args) {
  2. // RateLimiter.create (2) The number of tokens generated per second
  3. RateLimiter limiter = RateLimiter. create (2);
  4. // limiter.acquire() Obtain the token in a blocking manner
  5. System. out .println(limiter.acquire());;
  6. try {
  7. Thread.sleep(2000);
  8. } catch (InterruptedException e) {
  9. // TODO Auto-generated catch block
  10. e.printStackTrace();
  11. }
  12. System. out .println(limiter.acquire());;
  13. System. out .println(limiter.acquire());;
  14. System. out .println(limiter.acquire());;
  15. System. out .println(limiter.acquire());;
  16.  
  17. System. out .println(limiter.acquire());;
  18. System. out .println(limiter.acquire());;
  19. }

RateLimiter.create(2) capacity and burst. The token bucket algorithm allows tokens that have not been consumed for a period of time to be temporarily stored in the token bucket for burst consumption.

Progressive mode (SmoothWarmingUp: the token generation speed slowly increases until it reaches a stable value):

  1. // Smooth current limiting, the time interval from the cold start rate (full) to the average consumption rate
  2. RateLimiter limiter = RateLimiter. create (2,1000l,TimeUnit.MILLISECONDS);
  3. System. out .println(limiter.acquire());;
  4. try {
  5. Thread.sleep(2000);
  6. } catch (InterruptedException e) {
  7. // TODO Auto-generated catch block
  8. e.printStackTrace();
  9. }
  10. System. out .println(limiter.acquire());
  11. System. out .println(limiter.acquire());
  12. System. out .println(limiter.acquire());
  13. System. out .println(limiter.acquire());
  14.  
  15. System. out .println(limiter.acquire());
  16. System. out .println(limiter.acquire());

time out:

  1. boolean tryAcquire = limiter.tryAcquire(Duration.ofMillis(11));

Whether the token can be obtained within the timeout period, executed asynchronously.

③Distributed system current limiting (implemented by Nginx+Lua)

You can use resty.lock to maintain atomicity, so that the lock will not be re-entrant between requests:

https://github.com/openresty/lua-resty-lock

Use lua_shared_dict to store data:

  1. local locks = require "resty.lock"  
  2.  
  3. local   function acquire()
  4. local lock =locks:new( "locks" )
  5. local elapsed, err =lock:lock( "limit_key" ) -- Mutex lock ensures atomicity  
  6. local limit_counter =ngx.shared.limit_counter -- counter  
  7.  
  8. local   key = "ip:" ..os. time ()
  9. local limit = 5 -- current limit size  
  10. local   current =limit_counter:get( key )
  11.  
  12. if current ~= nil and   current + 1> limit then   -- If the current limit is exceeded  
  13. lock:unlock()
  14. return 0
  15. end  
  16. if current == nil then  
  17. limit_counter: set ( key , 1, 1) -- The first time you need to set the expiration time, set the key value to 1.  
  18. -- The expiration time is 1 second  
  19. else  
  20. limit_counter:incr( key , 1) -- just add 1 from the second time onwards  
  21. end  
  22. lock:unlock()
  23. return 1
  24. end  
  25. ngx.print(acquire())

Author: The harmonica that can't wait

Editor: Tao Jialong

Source: https://urlify.cn/fuAB7j

<<:  GTI releases 2.3GHz spectrum industry joint statement

>>:  Introducing a request library - How to use several Undici APIs

Recommend

Three major factors affecting CDN acceleration

With the trend of digital transformation, enterpr...

The wind is about to blow: the first large-scale commercial use of 5G

By 2018, 5G has gone from theory to reality and w...

LTE vs. 5G: What’s the difference?

For years, it seemed like the hype about 5G would...

HostSlick: €37/year KVM-2 cores/2GB/240G SSD/15TB@10Gbps/Netherlands VPS

HostSlick has launched a Christmas/New Year's...

What is missing from licensing 5G for commercial use?

On February 20, South Korea announced the officia...