If the user's traffic is like the surging waves, then the gateway is the dam to defend against the impact. In large-scale Internet projects, the gateway is indispensable and is our best defense method at present. Through the gateway, we can divert a large amount of traffic to various services. If we use some capabilities provided by the Lua script engine, we can also greatly reduce the coupling degree and performance loss of the system and save our costs. Generally speaking, gateways are divided into external network gateways and internal network gateways. The external network gateway is mainly responsible for current limiting, intrusion prevention, request forwarding and other tasks. The common way is to use Nginx + Lua to do similar work; in recent years, the development of the internal network gateway has seen the emergence of various customized gateways, such as ServiceMesh, SideCar and other methods, as well as Kong, Nginx Unit, etc. Although their uses are different, their main functions are still to do load balancing, traffic management scheduling and intrusion prevention. External network gateway functionLet's start with the usage of the external network gateway. I will share with you two types of practical designs of external network gateways. The two designs can help us prevent intrusion and contact business dependencies. Spider sniffing identificationWhen dealing with high-traffic websites, common security issues include illegal references and robot crawling. To prevent these problems, we can adopt some effective strategies, such as implementing rate limiting and intrusion detection functions through the gateway. Illegal reference prevention: Illegal references often lead to the abuse of our website resources. A common prevention method is to check the referer field in the request. If the referer of the request is not the domain name of this site, the request is rejected. This can effectively reduce the risk of unauthorized access to resources. Robot crawling prevention: Robot crawling is another common problem. To identify and prevent robot crawling, we can take the following approaches:
Through these measures, we can effectively deal with illegal citations and robot crawling problems, protect the website's resources and data security, and at the same time maintain search engine friendliness and ensure the normal operation of SEO. Gateway authentication and user center decouplingPreviously, we discussed how to use the gateway to block harassment from illegal users. In fact, in addition to defending against attacks and preventing resources from being consumed maliciously, the gateway can also help us remove some business dependencies. For example, in the design of user login, each business does not need to directly rely on the user center to verify the legitimacy of the user. User authentication usually achieves consistent verification logic by integrating the user center SDK in each sub-business. While this approach brings convenience, it also creates new problems: SDK synchronization dependency and upgrade maintenance. Basic public components generally provide convenience for business development through SDK, and if services are only provided through API, some special operations may need to be implemented repeatedly. However, once the SDK is released, we need to be prepared to maintain multiple versions of SDK online at the same time to ensure compatibility and functional stability. The following figure shows the comparison between using SDK authentication token and authenticating directly through the user center interface: picture With the SDK integrated, each business can verify the user's identity by itself without frequently requesting the user center. However, this solution also brings some challenges. Since the SDK is a component embedded in each project, the project usually does not frequently upgrade its version to maintain stability. This makes subsequent upgrades of the user center face resistance, because all dependent businesses must be taken into account during the upgrade. Each large-scale upgrade of basic services requires a lot of manpower to synchronize the SDK, which increases the complexity of maintenance. In order to solve this coupling problem, we can consider another design idea, which is to put the user login authentication function in the gateway layer. In this way, the business system no longer needs to directly rely on the SDK of the user center, but completes identity authentication and permission verification through the gateway. In this way, the gateway can directly authenticate the user when receiving the request, and only the verified request will be forwarded to the specific business service, thereby decoupling the direct dependency between the user center and various business systems. The following figure shows the request flow under this design. Please refer to the diagram for reference. I will further analyze its working mechanism and advantages. picture Combined with the above figure, let's look at the implementation process of this design. When a user requests a business interface, the gateway will first authenticate the identity of the requesting user. If the authentication is successful, the user information will be put into the header and passed to the backend service. The business API does not need to pay attention to the implementation details of the user center, and can directly obtain the user information from the header to continue working. If the business requires users to log in before using it, you can add a judgment in the business logic to check whether the request header contains uid. If uid is missing, a unified error code is returned to the front end to prompt the user to log in first. This authentication service design effectively decouples the business module and the user center. Even if the user center logic changes, there is no need to upgrade the business module synchronously. In addition to basic login authentication, this design can also achieve more flexible permission management at the gateway layer. For example, role-based access control (RBAC) or attribute-based access control (ABAC) can be enabled for certain domain names to tailor permission control policies for different business scenarios. Through the gateway, we can also provide different permissions for different users and support advanced features such as grayscale testing, thereby improving the flexibility and security of the system. Intranet Gateway ServiceNow that we know the two wonderful uses of the external network, let's look at the functions of the internal network. It can provide failure retry service and smooth restart mechanism. Let's look at them separately. Retry on failureWhen a project is released and upgraded or a service fails and restarts, the system may be temporarily unavailable. During this period, if there is a user request, a 504 error may be returned due to the backend not responding, affecting the user experience. To improve the user experience, you can use the automatic retry function of the intranet gateway. When a request reaches the backend, but the service returns an error such as 500, 403, or 504, the gateway can avoid returning an error immediately. Instead, the gateway can let the request wait for a while and try again later; or directly return the previously cached content. In this way, the business can achieve smooth hot updates, making the service appear more stable, so that users will not obviously perceive the fluctuations during the online upgrade process. Smooth restartDuring the service upgrade process, the smooth restart mechanism can prevent the service process from exiting immediately after receiving the kill signal. The specific approach is to first stop the service from receiving new requests and wait for the currently processed requests to complete. If the request processing times out (for example, more than 10 seconds), the service is forced to exit. This mechanism helps ensure that ongoing requests are properly handled and reduce the impact of service interruptions on users. Through this mechanism, user request processing will not be interrupted, so that the business transaction being processed can be guaranteed to be complete. Otherwise, it is likely to cause inconsistent business transactions or only half of them to be completed. With this retry and smooth restart mechanism, we can upgrade and release our code online at any time and release new features. However, after turning on this function, some online failures may be blocked. At this time, we can cooperate with the monitoring of the gateway service to help us detect the status of the system. Comprehensive application of internal and external gatewaysFirst, let's look at the gateway interface cache function, which is to use the gateway to cache some interface return content. It is suitable for use in service degradation scenarios to temporarily alleviate the impact of user traffic or to reduce the impact of intranet traffic. The specific implementation is shown in the following figure: picture From the above figure, we can see that the gateway's cache mechanism is usually implemented through temporary cache and TTL (time to live). When a user requests a service interface, if the response of the interface has been cached and the cache has not expired, the gateway will directly return the cached data to the client. This method can significantly reduce the burden on the backend data service. However, this approach is a trade-off, as it sacrifices strong data consistency in exchange for improved performance. At the same time, the cache mechanism has high performance requirements, and it is necessary to ensure that the gateway cache can handle the high QPS (queries per second) of external traffic. In order to avoid excessive penetration traffic, the cache data can be refreshed regularly through scripts. In this way, when the gateway finds a valid cache, it returns directly; if the cache does not hit, it will request the backend service and cache the result. This implementation is more flexible than relying solely on cache and can improve data consistency, but it also increases the complexity of development and maintenance, requiring additional code and operations to ensure system stability and data consistency. picture Of course, it is recommended that the length of this cached data should not exceed 5KB (10w QPS X 5KB = 488MB/s), because too long data will slow down the response speed of our cache service. Service MonitoringFinally, let's discuss the issue of using the gateway for service monitoring. Without link tracking, most system monitoring usually relies on the gateway's logs. By analyzing the HTTP status code in the gateway's access log, we can determine whether the service is running normally. At the same time, combined with the request response time information, we can implement basic system monitoring functions. Specifically, the gateway's access log records the HTTP status code (such as 200, 500, 404, etc.) and response time of each request. This information can help us monitor the health of the service, such as determining whether there are abnormal error codes (such as 500 errors) or request timeouts, and thus discover potential problems in a timely manner. The following diagram shows how to monitor the running status of the service through the gateway. Please refer to the diagram for further understanding. I will continue to analyze the details of this process. picture In order to more easily judge the status of online services, we can first collect statistics on the information. The specific method is to regularly aggregate the errors in the access logs and summarize the number of request errors of different interfaces. For example, after aggregation, we can get similar data: "500 errors occurred 20 times within 30 seconds, 504 errors occurred 15 times, and the response time of a domain name interface exceeded 1 second 40 times." These statistics help us quickly analyze the health status of the service. Unlike other monitoring methods, gateway monitoring can cover all businesses. Although the monitoring granularity is coarse, it is still an effective solution. If combined with Trace, we can record the Trace ID in the access log and use these IDs to further troubleshoot the specific cause of the problem. This implementation method has been used in some companies (such as Good Future and Geek Time), which improves the convenience of troubleshooting. |
RF Antenna picture An antenna is a device used to...
The 800Gb Ethernet specification doubles the top ...
Using WiFi to surf the Internet has become an ind...
Development Background Synaesthesia integration: ...
Data center migrations are often complex and risk...
Hello everyone, I am Zhibeijun. Today, I will lea...
5G is a very popular buzzword recently. According...
The 2017 Huawei Connect Conference officially cam...
[[385931]] The draft outline of the 14th Five-Yea...
Recently, the official website of the Ministry of...
[[414891]] This article is reprinted from the WeC...
DediPath is a foreign VPS service provider founde...
spinservers has released a regular promotion for ...
CloudCone has launched a new flash sale, launchin...
It has been more than half a year since I last sh...