Network | How to design a billion-level API gateway?

Network | How to design a billion-level API gateway?

The API gateway can be seen as the entrance for the system to connect with the outside world. We can process some non-business logic in the gateway, such as permission verification, monitoring, caching, request routing, etc.

Why do we need an API Gateway?

Why do we need an API Gateway? There are several reasons:

  • Convert RPC protocol to HTTP. Since we develop internally using RPC protocol (thrift or dubbo) and expose it to internal services, when external services need to use this interface, we often need to convert RPC protocol to HTTP protocol.

  • Request routing. In our system, since the same interface is used by both the old and new systems, we need to route the request to the corresponding interface based on the request context.

  • Unified authentication. Because the authentication operation does not involve business logic, it can be processed at the gateway layer without going down to the business logic layer.

  • Unified monitoring. Since the gateway is the entrance to external services, we can monitor the data we want here, such as input and output parameters, and link time.

  • Flow control, circuit breaking and degradation. For flow control, circuit breaking and degradation, non-business logic can be put into the gateway layer.

Many businesses will implement a gateway layer to access their own services, but this is not enough for the entire company.

Unified API Gateway

The unified API gateway not only has all the features of the API gateway, but also has the following benefits:

  • Unified technical component upgrades. If a technical component in a company needs to be upgraded, it is necessary to communicate with each business line, which usually takes several months to complete.

    For example, if there are major security risks in the entrance security authentication and an upgrade is required, it will not work if the speed is still so slow. With a unified gateway, the upgrade will be very fast.

  • Unified service access. It is also difficult to access a certain service. For example, if a company has developed a relatively stable service component and is vigorously promoting it, this cycle will definitely be very long. With the unified gateway, you only need the unified gateway for unified access.

  • Save resources. If different businesses and departments follow our approach above, they should all have their own gateway layer to do this. It can be imagined that if a company has 100 such businesses and each business is equipped with 4 machines, then 400 machines will be needed.

    And the development RD of each business needs to develop this gateway layer and maintain it at any time, which increases manpower. If there is a unified gateway layer, then maybe only 50 machines are needed to do the work of the gateway layer of these 100 businesses, and the business RD does not need to pay attention to the development and online steps at any time.

Design of unified gateway

Asynchronous Requests

For the gateway layer we implemented ourselves, since we are the only one using it, the throughput requirement is not high, so we usually call it synchronously.

For our unified gateway layer, how to use a small number of machines to access more services requires our asynchrony to increase throughput.

There are generally two strategies for asynchronization:

  • Tomcat/Jetty+NIO+servlet3. This strategy is widely used. JD.com, Youzan, and Zuul all use this strategy. This strategy is more suitable for HTTP. Asynchrony can be enabled in Servlet3.

  • Netty+NIO. Netty was born for high concurrency. Currently, Vipshop's gateway uses this strategy. In Vipshop's technical articles, Netty has a throughput of 300,000+ per second and Tomcat has a throughput of 130,000+ per second under the same circumstances.

    It can be seen that there is a certain gap between them, but Netty needs to handle the HTTP protocol itself, which is more troublesome.

If the gateway has more HTTP request scenarios, Servlet can be used, after all, it can handle HTTP protocol more maturely. If throughput is more important, Netty can be used.

Full link asynchronous

We have already used asynchrony for incoming requests. In order to achieve full-link asynchrony, we need to process outgoing requests asynchronously as well. For outgoing requests, we can use RPC's asynchronous support to make asynchronous requests.

So basically we can achieve the following figure:

First, enable Servlet asynchrony in the Web container, then enter the gateway's business thread pool for business processing, then make RPC asynchronous calls and register the business that needs callback, and finally perform callback processing in the callback thread pool.

Chain Processing  

There is a pattern in design patterns called the chain of responsibility pattern. Its function is to avoid coupling the request sender and the receiver together, so that multiple objects may receive the request, connect these objects into a chain, and pass the request along the chain until an object handles it.

This mode decouples the request sender and the request handler. This mode is implemented in various frameworks, such as Filter in Servlet and Interceptor in SpringMVC.

This pattern is also applied in Netflix Zuul, as shown in the following figure:

We can learn from this model in the design of our own gateway:

  • preFilters: Pre-filters are used to handle some common services, such as unified authentication, unified current limiting, circuit breaking and degradation, cache processing, etc., and provide business extensions.
  • routingFilters: used to process some generalized calls, mainly for protocol conversion and request routing.
  • postFilters: Post-filters, mainly used for result processing, logging, time recording, etc.
  • errorFilters: Error filters, used to handle call exceptions.

This design is also used in Youzan’s gateway.

Business Isolation  

In the case of full-link asynchrony, the impact between different businesses is small. However, if some synchronous calls are made in the provided custom Filter, once timeouts occur frequently, it will affect other businesses. Therefore, we need to use isolation techniques to reduce the mutual impact between businesses.

Semaphore Isolation

Semaphore isolation only limits the total number of concurrent calls, and the service is still called synchronously by the main thread. If the remote call times out, this isolation will still affect the main thread, thereby affecting other services.

Therefore, if you just want to limit the total concurrent calls of a service or the called service does not involve remote calls, you can use lightweight semaphores to achieve this. Since Youzan's gateway does not have a custom filter, it chooses semaphore isolation.

Thread pool isolation

The simplest way is to isolate different businesses through different thread pools. Even if there is a problem with the business interface, it will not affect other businesses because the thread pool has been isolated.

JD.com's gateway implementation uses thread pool isolation. More important businesses, such as products or orders, are processed separately through thread pools.

However, since it is a unified gateway platform, if there are many business lines and everyone feels that their business is more important, a separate thread pool isolation is required.

If you are developing in Java, threads are a heavy resource in Java and are relatively limited. If there are too many thread pools that need to be isolated, it is not very suitable.

If you use other languages ​​such as Golang to develop the gateway, threads are relatively light resources, so it is more suitable to use thread pool isolation.

Cluster Isolation

What should I do if some businesses require isolation but the unified gateway does not have thread pool isolation?

Then you can use cluster isolation. If some of your businesses are really important, you can apply for a separate cluster or multiple clusters for this series of businesses and isolate them between machines.

Request rate limiting

Traffic control can be implemented using many open source tools, such as Alibaba’s recently open source Sentinel and the more mature Hystrix.

Generally, current limiting is divided into cluster current limiting and single-machine current limiting:

  • Cluster current limiting: Use unified storage to save the current traffic situation. Generally, Redis can be used, which generally has some performance loss.

  • Single machine current limiting: To limit the current of each machine, we can directly use Guava's token bucket to do it. Since there is no remote call, the performance consumption is relatively small.

Circuit Breaker

You can also refer to the open source implementations of Sentinel and Hystrix for this area, but I won’t go into details here as it is not the focus.

Generic call

Generalized calls refer to the conversion of some communication protocols, such as converting HTTP to Thrift. This is not implemented in some open source gateways such as Zuul, because the internal service communication protocols of each company are different.

For example, Vipshop supports HTTP1 , HTTP2 , and binary protocols, which are then converted into internal protocols.

Taobao supports HTTPS , HTTP1 , HTTP2, and these protocols can be converted into HTTP, HSF , Dubbo and other protocols.

How to implement generalized calls? Since protocols are difficult to convert automatically, a mapping needs to be provided for the interface corresponding to each protocol.

Simply put, both protocols can be converted into a common language so that they can be converted to each other, as shown in the following figure:

Generally speaking, there are three ways to specify a common language:

json: The json data format is relatively simple, has a fast parsing speed, and is relatively lightweight. In the Dubbo ecosystem, there is a HTTP to Dubbo project that uses JsonRpc to convert HTTP to JsonRpc and then to Dubbo.

For example, you can map a www.baidu.com/id = 1 GET to json:

xml: xml data is heavy and difficult to parse, so we won’t discuss it in detail here.

Custom description language: Generally speaking, this is more expensive, and you need to define your own language to describe and parse, but its scalability and customization are excellent. For example: Spring customizes its own SPEL expression language.

If you want to design generalized calls yourself, json can basically meet your needs. If you have a lot of personalized needs, you can define your own language.

Management Platform

The above are all technical keys to implementing a gateway. Here we need to introduce a business key of the gateway.

After having a gateway, a management platform is needed to configure the key technologies described above, including but not limited to the following configurations:

  • Current Limitation

  • Circuit Breaker

  • cache

  • log

  • Custom Filter

  • Generic call

Summarize  

***A reasonable standard gateway should be implemented as follows:

References:

  • JD.com: http://www.yunweipai.com/archives/23653.html
  • Youzan Gateway: https://tech.youzan.com/api-gateway-in-practice/
  • Vipshop: https://mp.weixin.qq.com/s/gREMe-G7nqNJJLzbZ3ed3A
  • Zuul: http://www.scienjus.com/api-gateway-and-netflix-zuul/

<<:  The Complete Guide to WiFi Penetrating Walls

>>:  What other uses does a wireless router have besides WiFi access?

Recommend

Global IoT connection technology market forecast by type in 2025

As the number of global IoT deployments continues...

Japan and Finland jointly develop 6G technology, Nokia will participate

Recently, foreign media reported that industry gr...

A brief history of computer networks

The development of computer networks has come a l...

Free VPS, Free VPS Merchants with $50-100, Free Trial VPS

The tribe mainly shares cheap VPS hosts. Although...

IPv6 basics explained in one minute

1. IPv6 Background The most fundamental change of...

[Black Friday] TNAHosting: $9/year KVM-1GB/15G SSD/5TB/Chicago Data Center

TNAHosting's Black Friday promotion includes ...

Network Address Translation Protocol (NAT) and Its Application Examples

1. The meaning of NAT When some hosts within a pr...