20 billion daily traffic, Ctrip gateway architecture design

The author of the solution: Butters, a software technology expert at Ctrip, specializes in network architecture, API gateway, load balancing, Service Mesh and other fields.

1. Overview

Similar to the practices of many companies, Ctrip API Gateway is an infrastructure introduced along with the microservice architecture, and its initial version was released in 2014. With the rapid advancement of service-orientedness within the company, the gateway has gradually become a standard solution for exposing applications to the external network. In subsequent projects such as "ALL IN Wireless", internationalization, and multi-site active-active, the gateway has continued to develop with the joint evolution of the company's public business and infrastructure. As of July 2021, the total number of access services exceeded 3,000, and the average daily traffic processed reached 20 billion.

In terms of technical solutions, the early development of the company's microservices was deeply influenced by NetflixOSS. The gateway part was also first developed with reference to Zuul 1.0. The core can be summarized in the following four points:

Server side: Tomcat NIO + AsyncServlet
Business process: independent thread pool, phased chain of responsibility model
Client: Apache HttpClient, synchronous call
Core components: Archaius (dynamic configuration client), Hystrix (circuit breaker and current limit), Groovy (hot update support)

picture

As we all know, synchronous calls will block threads, and the system throughput is greatly affected by IO.

As an industry leader, Zuul has taken this issue into consideration when designing: by introducing Hystrix, resource isolation and current limiting are achieved, and failures (slow IO) are limited to a certain range; combined with the circuit breaker strategy, some thread resources can be released in advance; ultimately achieving the goal of local anomalies not affecting the overall situation.

However, as the company's business continued to develop, the effectiveness of the above strategy gradually weakened, mainly due to two reasons:

Business going overseas: Gateway is the overseas access layer, part of the traffic needs to be transferred back to China, slow IO becomes the norm
Service scale growth: Local anomalies become the norm, and coupled with the characteristics of microservice anomaly diffusion, the thread pool may be in a sub-healthy state for a long time

picture

Fully asynchronous transformation is a core task of Ctrip's API gateway in recent years. This article will also revolve around this and explore our work and practical experience in the gateway.

The key points include: performance optimization, business model, technical architecture, governance experience, etc.

2. High-performance gateway core design

2.1. Asynchronous process design

Full asynchrony = server-side asynchrony + business process asynchrony + client-side asynchrony

For the server and client, we use the Netty framework, whose NIO/Epoll + Eventloop is essentially an event-driven design.

The core part of our transformation is to make the business process asynchronous. Common asynchronous scenarios include:

Business IO events: such as request verification, identity authentication, involving remote calls
Self-IO events: for example, the first xx bytes of a message are read
Request forwarding: including TCP connection, HTTP request

From experience, asynchronous programming is slightly more difficult to design and read and write than synchronous programming, mainly including:

Process Design & State Transition
Exception handling, including general exceptions and timeouts
Context transfer, including business context and trace log
Thread Scheduling
Flow Control

Especially in the Netty context, if the lifecycle of ByteBuf is not well designed, it is easy to cause memory leaks.

To address these issues, we have designed corresponding peripheral frameworks, making the greatest effort to smooth out the synchronous/asynchronous differences in business codes to facilitate development; at the same time, we provide default protection and fault tolerance to ensure the overall security of the program.

In terms of tools, we used RxJava, and its main process is shown in the figure below.

picture

Maybe
RxJava's built-in container class, indicating normal completion, one and only one object returned, and abnormality
Responsive, easy to design the overall state machine, with built-in exception handling, timeout, thread scheduling and other encapsulation
Maybe.empty()/Maybe.just(T), suitable for synchronous scenarios
Tool class RxJavaPlugins, convenient for aspect logic encapsulation
Filter
Represents an independent piece of business logic, unified interface for synchronous and asynchronous business, returns Maybe
Unified encapsulation of asynchronous scenarios (such as remote calls). If thread switching is involved, switch back through maybe.obesrveOn(eventloop)
Asynchronous filters increase the timeout by default, and are processed as weak dependencies, ignoring errors

 public interface Processor<T> { ProcessorType getType(); int getOrder(); boolean shouldProcess(RequestContext context); //对外统一封装为Maybe Maybe<T> process(RequestContext context) throws Exception; }

 public abstract class AbstractProcessor implements Processor { //同步&无响应，继承此方法//场景：常规业务处理protected void processSync(RequestContext context) throws Exception {} //同步&有响应，继承此方法，健康检测//场景：健康检测、未通过校验时的静态响应protected T processSyncAndGetReponse(RequestContext context) throws Exception { process(context); return null; }; //异步，继承此方法//场景：认证、鉴权等涉及远程调用的模块protected Maybe<T> processAsync(RequestContext context) throws Exception { T response = processSyncAndGetReponse(context); if (response == null) { return Maybe.empty(); } else { return Maybe.just(response); } }; @Override public Maybe<T> process(RequestContext context) throws Exception { Maybe<T> maybe = processAsync(context); if (maybe instanceof ScalarCallable) { //标识同步方法，无需额外封装return maybe; } else { //统一加超时，默认忽略错误return maybe.timeout(getAsyncTimeout(context), TimeUnit.MILLISECONDS, Schedulers.from(context.getEventloop()), timeoutFallback(context)); } } protected long getAsyncTimeout(RequestContext context) { return 2000; } protected Maybe<T> timeoutFallback(RequestContext context) { return Maybe.empty(); } }

Overall process
Following the design of the responsibility chain, it is divided into four stages: inbound, outbound, error, and log
Each stage consists of one or more filters
Filters are executed sequentially and interrupted when an exception occurs. During inbound, any filter returning a response will also trigger an interrupt.

 public class RxUtil{ //组合某阶段（如Inbound）内的多个filter（即Callable<Maybe<T>>） public static <T> Maybe<T> concat(Iterable<? extends Callable<Maybe<T>>> iterable) { Iterator<? extends Callable<Maybe<T>>> sources = iterable.iterator(); while (sources.hasNext()) { Maybe<T> maybe; try { maybe = sources.next().call(); } catch (Exception e) { return Maybe.error(e); } if (maybe != null) { if (maybe instanceof ScalarCallable) { //同步方法T response = ((ScalarCallable<T>)maybe).call(); if (response != null) { //有response，中断return maybe; } } else { //异步方法if (sources.hasNext()) { //将sources传入回调，后续filter重复此逻辑return new ConcattedMaybe(maybe, sources); } else { return maybe; } } } } return Maybe.empty(); } }

 public class ProcessEngine{ //各个阶段，增加默认超时与错误处理private void process(RequestContext context) { List<Callable<Maybe<Response>>> inboundTask = get(ProcessorType.INBOUND, context); List<Callable<Maybe<Void>>> outboundTask = get(ProcessorType.OUTBOUND, context); List<Callable<Maybe<Response>>> errorTask = get(ProcessorType.ERROR, context); List<Callable<Maybe<Void>>> logTask = get(ProcessorType.LOG, context); RxUtil.concat(inboundTask) //inbound阶段.toSingle() //获取response .flatMapMaybe(response -> { context.setOriginResponse(response); return RxUtil.concat(outboundTask); }) //进入outbound .onErrorResumeNext(e -> { context.setThrowable(e); return RxUtil.concat(errorTask).flatMap(response -> { context.resetResponse(response); return RxUtil.concat(outboundTask); }); }) //异常则进入error，并重新进入outbound .flatMap(response -> RxUtil.concat(logTask)) //日志阶段.timeout(asyncTimeout.get(), TimeUnit.MILLISECONDS, Schedulers.from(context.getEventloop()), Maybe.error(new ServerException(500, "Async-Timeout-Processing")) ) //全局兜底超时.subscribe( //释放资源unused -> { logger.error("this should not happen, " + context); context.release(); }, e -> { logger.error("this should not happen, " + context, e); context.release(); }, () -> context.release() ); } }

2.2. Streaming forwarding & single thread

Taking HTTP as an example, the message can be divided into three parts: initial line/header/body.

picture

At Ctrip, the gateway layer service does not involve the request body.

Because there is no need to store the entire data, you can directly enter the business process after parsing the request header.

At the same time, if the request body is received:

① If the request has been forwarded to upstream, forward it directly;

② Otherwise, it needs to be temporarily stored and sent together with the initial line/header after the business process is completed;

③The same goes for handling upstream responses.

Compared with the complete parsing of HTTP messages, it is handled like this:

Entering the business process earlier means that the upstream receives the request earlier, which can effectively reduce the delay introduced by the gateway layer.
The body lifecycle is compressed, which can reduce the memory overhead of the gateway itself

Although performance has been improved, stream processing also greatly increases the complexity of the entire process.

picture

In non-streaming scenarios, the Netty Server encoding and decoding, inbound business logic, Netty Client encoding and decoding, and outbound business logic are all independent sub-processes that process complete HTTP objects. However, after adopting streaming processing, requests may be in multiple processes at the same time, which brings the following three challenges:

Thread safety issues: If different processes use different threads, then concurrent modifications of the context may be involved;
Multi-stage linkage: For example, if the Netty Server is disconnected while receiving a request, and it is already connected to the upstream, the upstream protocol stack cannot be completed and the connection must be closed.
Edge scenario processing: For example, if the upstream returns 404/413 when the request is not fully sent, should we choose to continue sending, complete the protocol stack, and allow the connection to be reused, or choose to terminate the process early to save resources but abandon the connection at the same time? For another example, if the upstream has received the request but has not responded, and the Netty Server is suddenly disconnected, should the Netty Client also be disconnected? And so on.

To address these challenges, we adopted a single-threaded approach. The core design includes:

The launch document is bound to EventLoop, and Netty Server/business process/Netty Client are executed in the same eventloop;
If an asynchronous filter must use an independent thread pool due to the IO library, it must be switched back in post-processing;
Provide necessary thread isolation for resources within the process (such as connection pools);

The single-threaded approach avoids concurrency issues. When dealing with multi-stage linkage and edge scenario issues, the entire system is in a certain state, effectively reducing development difficulty and risk. In addition, reducing thread switching can also improve performance to a certain extent. However, due to the small number of worker threads (generally equal to the number of CPU cores), IO operations must be completely avoided in the eventloop, otherwise it will have a significant impact on the system throughput.

2.3 Other Optimizations

Internal variable lazy loading

For the cookie/query and other fields of the request, if it is not necessary, do not perform string parsing in advance

Off-heap memory & zero copy

Combined with the previous streaming forwarding design, the system memory usage can be further reduced.

Since the project upgraded to TLSv1.3, JDK11 was introduced (JDK8 was supported later, version 8u261, 2020.7.14), and a new generation of garbage collection algorithms were also tried, and their actual performance was indeed as good as expected. Although the CPU usage increased, the overall GC time consumption decreased significantly.

picture

Custom HTTP codec

Due to the long history and openness of the HTTP protocol, many "bad practices" have emerged, which may affect the success rate of requests or even pose a threat to website security.

Traffic Management

For problems such as request body too large (413), URI too long (414), non-ASCII characters (400), etc., general web servers will choose to directly reject and return the corresponding status code. Since these problems skip the business process, they will cause some trouble in statistics, service location and troubleshooting. By extending the codec, problematic requests can also complete the routing process, which helps solve the management problem of non-standard traffic.

Request Filtering

For example, request smuggling (fixed in Netty 4.1.61.Final, released on March 30, 2021). By extending the codec and adding custom validation logic, security patches can be applied faster.

3. Gateway Service Model

As an independent and unified inbound traffic entry point, the value of the gateway to the enterprise is mainly reflected in the following three aspects:

Decoupling different network environments: Typical scenarios include intranet & extranet, production environment & office area, different security domains within IDC, dedicated lines, etc.
Natural public business aspects: including security, authentication, anti-crawl, routing, grayscale, current limiting, circuit breaking, degradation, monitoring, alarming, troubleshooting, etc.

picture

Efficient and flexible flow control

Here are a few detailed scenarios:

Private Protocol

In the closed client (APP), the framework layer will intercept the HTTP request initiated by the user and transmit it to the server through a private protocol (SOTP).

In terms of site selection: ① Allocate IP through the server to prevent DNS hijacking; ② Preheat the connection; ③ Adopt a customized site selection strategy, which can be switched automatically according to network conditions, environment and other factors.

In terms of interaction mode: ① Use a lighter protocol body; ② Perform encryption, compression and multiplexing in a unified manner; ③ The gateway converts the protocol uniformly at the entrance without affecting the business.

Link Optimization

The key is to introduce the access layer to allow remote users to access nearby and solve the problem of excessive handshake overhead. At the same time, since both the access layer and the IDC are controllable, there is greater room for optimization in terms of network link selection, protocol interaction mode, etc.

Multiple live locations

Different from the proportional allocation and nearest access strategies, in the multi-site active-active mode, the gateway (access layer) needs to perform traffic diversion based on the shardingKey of the business dimension (such as userId) to prevent underlying data conflicts.

picture

4. Gateway Governance

The following diagram summarizes the working status of the online gateway. The vertical corresponds to our business process: the traffic from various channels (such as APP, H5, mini-programs, suppliers) and various protocols (such as HTTP, SOTP) is distributed to the gateway through load balancing, and after a series of business logic processing, it is finally forwarded to the backend service. After the improvements in Chapter 2, the horizontal business has been significantly improved in terms of performance and stability.

picture

On the other hand, due to the existence of multiple channels/protocols, online gateways are deployed in independent clusters according to the business. In the early days, business differences (such as routing data, functional modules) were managed through independent code branches, but as the number of branches increased, the complexity of overall operation and maintenance continued to increase. In system design, complexity usually also means risk. Therefore, how to uniformly manage multi-protocol and multi-role gateways, and how to quickly build customized gateways for new businesses at a lower cost, became the focus of our next stage of work.

The solution has been intuitively presented in the figure. The first is to perform compatibility processing on the protocol so that the online code can run under one framework; the second is to introduce a control plane to uniformly manage the different characteristics of online gateways.

picture

4.1 Multi-protocol compatibility

The multi-protocol compatibility method is not new. You can refer to Tomcat's abstract processing of HTTP/1.0, HTTP/1.1, and HTTP/2.0. Although HTTP has added many new features in each version, we usually cannot perceive these changes when developing business. The key lies in the abstraction of the HttpServletRequest interface.

At Ctrip, the online gateway processes stateless protocols in request-response mode, and the message structure can also be divided into three parts: metadata, extension header, and business message, so similar attempts can be easily carried out. The related work can be summarized in the following two points:

Protocol Adaptation Layer: used to shield the encoding and decoding of different protocols, interaction modes, and processing of TCP connections, etc.
Define common intermediate models and interfaces: Business-oriented programming of intermediate models and interfaces can better focus on the business attributes corresponding to the protocol.

picture

4.2 Routing Module

The routing module is one of the two main components of the control plane. In addition to managing the mapping relationship between gateways and services, the service itself can be summarized by the following model:

 { //匹配方式"type": "uri", //HTTP默认采用uri前缀匹配，内部通过树结构寻址；私有协议（SOTP）通过服务唯一标识定位。 "value": "/hotel/order", "matcherType": "prefix", //标签与属性//用于portal端权限管理、切面逻辑运行（如按核心/非核心）等"tags": [ "owner_admin", "org_framework", "appId_123456" ], "properties": { "core": "true" }, //endpoint信息"routes": [{ //condition用于二级路由，如按app版本划分、按query重分配等"condition": "true", "conditionParam": {}, "zone": "PRO", //具体服务地址，权重用于灰度场景"targets": [{ "url": "http://test.ctrip.com/hotel", "weight": 100 } ] }] }

4.3 Module Arrangement

Module scheduling is another key component of the control plane. We have set up multiple stages in the gateway processing flow (indicated by pink in the figure). In addition to common functions such as circuit breaking, current limiting, and logging, the business functions that need to be executed by different gateways are uniformly allocated by the control plane during runtime. These functions have independent code modules inside the gateway, and the control plane additionally defines the execution conditions, parameters, grayscale ratios, and error handling methods corresponding to these functions. This scheduling method also ensures the decoupling between modules to a certain extent.

picture

 { //模块名称，对应网关内部某个具体模块"name": "addResponseHeader", //执行阶段"stage": "PRE_RESPONSE", //执行顺序"ruleOrder": 0, //灰度比例"grayRatio": 100, //执行条件"condition": "true", "conditionParam": {}, //执行参数//大量${}形式的内置模板，用于获取运行时数据"actionParam": { "connection": "keep-alive", "x-service-call": "${request.func.remoteCost}", "Access-Control-Expose-Headers": "x-service-call", "x-gate-root-id": "${func.catRootMessageId}" }, //异常处理方式，可以抛出或忽略"exceptionHandle": "return" }

V. Conclusion

Gateway has always been a hot topic on various technical exchange platforms, and there are many mature solutions: the easy-to-use and early-developed Zuul 1.0, the high-performance Nginx, the highly integrated Spring Cloud Gateway, the increasingly popular Istio, and so on.

The final selection still depends on the business background and technical ecology of each company.

Therefore, at Ctrip, we chose the path of independent research and development.

Technology is constantly evolving, and we are also continuing to explore, including the relationship between public gateways and business gateways, the application of new protocols (such as HTTP3), the association with ServiceMesh, and so on.

<<: H3C Ao Xiangqiao: SD-WAN will eventually move towards a high-level self-intelligent network

>>: How Do PoE Switches Work?

Ministry of Industry and Information Technology: 5G package users have exceeded 350 million, China Mobile ranks first with more than 180 million users

Blog

Misunderstood 5G antennas disguised in layers - I feel bitter but cannot express it

Liu Liehong from the Ministry of Industry and Information Technology: Three suggestions to promote the development of 5G integrated applications

Blog

With the support of celebrities, how fast can 5G run?

Blog

China Mobile launches A-share listing: "Making money" but not "cutting leeks"

Blog

Recommend

Building the hospital of the future Aruba helps the Children's Hospital of Alabama in the United States transform its digital

Alabama Children's Hospital is a large childr...

Fiber optic technology breakthrough could increase Internet speeds 100 times

[[247528]] According to foreign media reports, an...

RAKsmart: Asian servers starting from $59/month, 50M bandwidth, available in Korea/Japan/Hong Kong/Taiwan/Singapore data centers

RAKsmart provides independent servers and cloud s...

[Black Friday] DediPath offers 35% off on all VPS/Hybrid Servers starting from $1.2/month, multiple data centers available in Los Angeles/San Jose, etc.

I searched and found that DediPath's previous...

20 billion daily traffic, Ctrip gateway architecture design