I finally figured out the service flow limit issue.

I finally figured out the service flow limit issue.

Introduction

With the development of microservices and distributed systems, the mutual calls between services are becoming more and more complex. In order to ensure the stability and high availability of our own services, we need to take certain flow control measures when facing requests that exceed our own service capabilities. For example, during the May Day and National Day holidays, when tourist attractions are full, we need to limit the flow of tourists. Our services also need to limit the flow of services in high-concurrency and high-traffic scenarios such as flash sales, big promotions, 618, Double Eleven, and possible malicious attacks, crawlers, etc.

Intercepting requests that exceed the service's processing capacity and limiting the traffic to the service is called service throttling. Next, let's talk about service throttling.

Two current limiting methods

Common current limiting methods can be divided into two categories: request-based current limiting and resource-based current limiting.

  1. Request-based current limiting

Request-based current limiting refers to considering current limiting from the perspective of external access requests. There are two common methods.

The first is to limit the total amount, that is, to limit the cumulative upper limit of a certain indicator. The most common one is to limit the total number of users served by the current system. For example, a certain live broadcast room is limited to 1 million users, and new users cannot enter after the number exceeds 1 million. There are only 100 items in a rush sale, and the upper limit of the number of users who can participate in the rush sale is 10,000. Users above 10,000 will be directly rejected.

The second is to limit the amount of time, that is, to limit the upper limit of a certain indicator within a period of time, for example, only 10,000 users are allowed to access within 1 minute; the maximum peak request per second is 100,000.

advantage:

  • Simple implementation

shortcoming:

  • The main problem in practice is that it is difficult to find a suitable threshold. For example, the system is set to handle 10,000 users per minute, but in reality the system cannot handle 6,000 users; or when the system reaches 10,000 users per minute, the pressure is not great, but user access has already been discarded. In addition, hardware-related factors must be considered. For example, the processing power of a 32-core machine and a 64-core machine is very different, and the threshold is different.

application:

  • Suitable for systems with relatively simple business functions, such as load balancing systems, gateway systems, rush purchase systems, etc.
  1. Resource-based current limiting

Request-based current limiting considers the system from the outside, while resource-based current limiting considers the system from the inside, that is, finding the key resources that affect the performance inside the system and limiting their usage. Common internal resources include the number of connections, file handles, number of threads, and request queues. For example, when the CPU usage rate exceeds 80%, new requests will be rejected.

advantage:

  • Effectively reflect the current system pressure and better limit the flow

shortcoming:

  • Difficulty identifying key resources
  • It is difficult to determine the thresholds of key resources, and it is necessary to debug step by step online and observe continuously until the appropriate value is found.

application:

  • Applicable to a specific service, such as order system, product system, etc.

Four current limiting algorithms

There are four common current limiting algorithms. Their implementation principles and advantages and disadvantages are different. In actual design, you need to choose one based on the business scenario.

  1. Fixed time window

The implementation principle of the fixed time window algorithm is to count the number of requests or resource consumption within a fixed time period. If the limit is exceeded, the current limit will be activated, as shown in the following figure:

advantage:

  • Simple implementation

shortcoming:

  • There is a critical point problem. For example, the red and blue points in the above figure are only 10 seconds apart, but the number of requests during this period has reached 200, exceeding the limit specified by the algorithm (100 requests processed within 1 minute). However, because these requests come from two statistical windows, the limit has not been exceeded from a single window, so the current limit will not be activated, which may cause the system to crash due to excessive pressure.
  1. Sliding time window

In order to solve the critical point problem, the sliding time window algorithm came into being. Its implementation principle is that the two statistical periods partially overlap, so as to avoid the situation where two statistical points in a short period of time belong to different time windows, as shown in the following figure:

advantage:

  • There is no critical point problem

shortcoming:

  • Compared with fixed windows, the complexity is increased
  1. Leaky Bucket Algorithm

The implementation principle of the leaky bucket algorithm is to put requests into a "bucket" (message queue, etc.), and the business processing unit (thread, process, application, etc.) takes the request from the bucket for processing. If the bucket is full, new requests are discarded, as shown in the following figure:

advantage:

  • When there is a burst of traffic, fewer requests are discarded because the leaky bucket itself has the function of caching requests.

shortcoming:

  • It can smooth traffic, but it cannot solve the problem of sudden traffic increase.
  • It is difficult to dynamically adjust the bucket size, and constant attempts are required to find the optimal bucket size that meets business needs.
  • It is impossible to accurately control the outflow rate, that is, the processing speed of the business.

The leaky bucket algorithm is mainly applicable to scenarios with instantaneous high concurrent traffic (such as the 0:00 sign-in and hourly flash sales mentioned above). When a large number of requests pour in within a few minutes, for better business results and user experience, even if the processing is slow, try not to discard user requests.

  1. Token Bucket Algorithm

The difference between the token bucket algorithm and the leaky bucket algorithm is that what is put into the bucket is not a request, but a "token", which is the "license" required before business processing. In other words, when the system receives a request, it must first get a "token" from the token bucket. Only after getting the token can it process further. If it fails to get the token, the request will be discarded.

Its implementation principle is shown in the following figure:

advantage:

  • By controlling the rate at which tokens are put in, the processing rate can be adjusted dynamically to achieve greater flexibility.
  • It can smoothly limit the flow and tolerate burst traffic at the same time, because a certain number of tokens can be accumulated in the bucket. When burst traffic comes, there are accumulated tokens in the bucket. At this time, the business processing speed will exceed the speed of token placement.

shortcoming:

  • When there is a burst of large traffic, many requests may be discarded because the token bucket cannot accumulate too many tokens.
  • The implementation is relatively complex.

The token bucket algorithm is mainly applicable to two typical scenarios. One is the need to control the speed of accessing third-party services to prevent the downstream from being overwhelmed. For example, Alipay needs to control the rate of accessing bank interfaces. The other is the need to control one's own processing speed to prevent overload. For example, if the stress test results show that the maximum processing TPS of the system is 100, then the token bucket can be used to limit the maximum processing speed.

Five current limiting strategies

  1. Denial of Service

When the request flow reaches the current limiting threshold, the excess requests are directly rejected.

The design can be used to reject requests from different sources such as specified domain names, IP addresses, clients, applications, and users.

  1. Delayed processing

By adding excess requests to the cache queue or delay queue, we can cope with short-term traffic surges, and gradually process the accumulated request traffic after the peak period.

  1. Request classification (priority)

Set priorities for requests from different sources and process higher priority requests first, such as VIP customers and important business applications (for example, transaction services have a higher priority than log services).

  1. Dynamic current limiting

It can monitor system-related indicators, evaluate system pressure, and dynamically adjust the current limiting threshold through the registration center, configuration center, etc.

  1. Monitoring and early warning & dynamic expansion

If there is an excellent service monitoring system and automatic deployment and release system, the monitoring system can automatically monitor the system operation status and issue early warnings through emails, text messages, etc. in the event of a sudden increase in service pressure or a large increase in traffic in the short term.

When certain conditions are met, related services can be automatically deployed and released, achieving the effect of dynamic capacity expansion.

Three current limiting positions

  1. Access layer current limiting

You can use Nginx, API routing gateway, etc. to limit the flow of domain names or IPs, and intercept illegal requests.

  1. Application current limiting

Each service can have its own single-machine or cluster current limiting measures, or call third-party current limiting services, such as Alibaba's Sentinel current limiting framework.

  1. Basic service flow control

You can also limit the flow of the basic service layer.

  • Database: Limit database connections, limit read and write rates
  • Message queue: limit consumption rate (consumption amount, consumption thread)

Summarize

This article summarizes two ways of service current limiting from a macro perspective, three locations where current limiting can be applied, four common current limiting algorithms, and five current limiting strategies. Finally, I would like to add that reasonable current limiting configuration requires understanding of the system throughput, so current limiting generally needs to be combined with capacity planning and stress testing. When external requests approach or reach the maximum threshold of the system, current limiting is triggered, and other means are taken to downgrade to protect the system from being overwhelmed.

Reference: http://www.studyofnet.com/555653372.html

<<:  Which network IO model should RPC design use?

>>:  Huawei's Li Peng: Accelerate the prosperity of 5G and its evolution to 5.5G, and accelerate the move towards an intelligent world

Recommend

5G is just about faster internet speed? If you think so, you are out!!!

1. What is 5G? The world's communication tech...

IDC: Ethernet switch market grows 2%

According to IDC's Worldwide Quarterly Ethern...

Critical documentation in data center transformation

Documentation is often neglected in IT work. When...

What is OSI model?

Today I tweeted some thoughts about how the OSI m...

CentOS7 mount 4TB disk

I was helping a friend online to mount a disk on ...

Comprehensively promote IPv6 and completely change network life

If you have been following the developments in ne...

Knowledge points of wireless network coverage system

1. What is AP? Answer: AP - Wireless Access Point...

What exactly is “5G New Call”?

In today’s article, let’s talk about a very popul...

TCP Sliding Window Principle Analysis

I. Summary A few days ago, when I was sharing an ...