background Under the microservice software architecture, it is quite time-consuming and labor-intensive to build a complete set of test systems for verification before the launch of new business functions. As the number of split microservices increases, the difficulty increases. The machine cost required for this entire test system is often not low. In order to ensure the efficiency of functional correctness verification before the launch of the new version of the application, this system must be maintained separately. When the business becomes large and complex, it is often necessary to prepare multiple sets. This is a common and difficult cost and efficiency challenge faced by the entire industry. If the functional verification of the new version before the launch can be completed in the same production system, the manpower and financial resources saved are considerable. In addition to functional verification in the development phase, the introduction of grayscale release in the production environment can better control the risk and explosion radius of the new version of the software. Grayscale release is to allocate production traffic with certain characteristics or proportions to the service version that needs to be verified to observe whether the operating status of the new version after it is launched meets expectations. Alibaba Cloud ASM Pro (see the end of the article for related links) is a full-link grayscale solution built based on Service Mesh, which can help solve the problems in the above two scenarios. ASM Pro product functional architecture diagram: The core capabilities used are the expanded traffic labeling, label-based routing, and traffic fallback capabilities shown in the preceding figure. The following is a detailed description. Scenario Description Common scenarios for full-link grayscale release are as follows: Taking Bookinfo as an example, the ingress traffic will carry the expected tag group. The sidecar obtains the expected tag in the request context (Header or Context) and distributes the traffic to the corresponding tag group. If the corresponding tag group does not exist, it will fallback to the base group by default. The specific fallback strategy can be configured. The following describes the specific implementation details in detail. The tag label of the ingress traffic is generally based on the tag plug-in method at the gateway level to label the request traffic. For example, userids within a certain range are marked with a grayscale tag. Considering the selection and implementation diversity of gateways in actual environments, the implementation of the gateway is not within the scope of this article. Next, we will focus on how to achieve full-link traffic labeling and full-link grayscale based on ASM Pro. Implementation principle Inbound refers to the inbound traffic that requests are sent to the App, and Outbound refers to the outbound traffic that the App sends outbound requests. The figure above shows a typical traffic path of a business application after mesh is enabled: the business app receives an external request p1, and then calls the interface of another service it depends on. At this time, the traffic path of the request is p1->p2->p3->p4, where p2 is the forwarding of p1 by Sidecar, and p4 is the forwarding of p3 by Sidecar. In order to achieve full-link grayscale, both p3 and p4 need to obtain the traffic label coming from p1 to route the request to the backend service instance corresponding to the label, and p3 and p4 must also carry the same label. The key is how to make the transfer of labels completely imperceptible to the application, so as to achieve full-link label transparent transmission, which is the key technology of full-link grayscale. The implementation of ASM Pro is based on traceId in distributed link tracking technology (such as OpenTracing, OpenTelemetry, etc.) to achieve this function. In distributed link tracing technology, traceId is used to uniquely identify a complete call chain. Every fanout call issued by an application on the link will carry the source traceId through the distributed link tracing SDK. The implementation of the ASM Pro full-link grayscale solution is based on the widely adopted practice of this distributed application architecture. In the above figure, the inbound and outbound traffic that Sidecar originally sees are completely independent. It is unable to perceive the corresponding relationship between the two, nor is it clear whether an inbound request leads to multiple outbound requests. In other words, Sidecar is not aware of whether there is a corresponding relationship between the two requests p1 and p3 in the figure. In the ASM Pro full-link grayscale solution, the p1 and p3 requests are associated through traceId, specifically relying on the x-request-id trace header in the Sidecar. Sidecar maintains a mapping table that records the correspondence between traceId and labels. When Sidecar receives the p1 request, it stores the traceId and label in the request in this table. When the p3 request is received, it queries the mapping table to obtain the label corresponding to the traceId and adds this label to the p4 request, thereby achieving full-link labeling and label-based routing. The following figure roughly illustrates this implementation principle. In other words, the full-link grayscale function of ASM Pro requires the application to use distributed link tracking technology. If the application that wants to use this technology does not use distributed link tracking technology, it will inevitably involve certain modifications. For Java applications, you can still consider using Java Agent in AOP to enable the business to implement traceId transparent transmission between inbound and outbound without modification. Realize traffic marking ASM Pro introduces a new TrafficLabel CRD to define where the traffic label that Sidecar needs to pass through is obtained. The YAML file listed below defines the source of the traffic label and the need to store the label in OpenTracing (specifically the x-trace header). The traffic label is named trafficLabel, and the value is obtained from $getContext(x-request-id) and finally from $(localLabel) in the local environment.
The CR definition consists of two parts, namely, label acquisition and storage. Acquisition logic: First, obtain the traffic label according to the protocol context or the defined field in the header (Header part). If not, it will be obtained through the map recorded locally by Sidecar according to the traceId. The map table stores the mapping of traceId to traffic identifier. If the corresponding mapping is found in the map table, the traffic will be marked with the corresponding traffic label. If it cannot be obtained, the traffic label will be set to the localLabel of the local deployment environment. The localLabel corresponds to the associated label of the local deployment, and the label name is ASM_TRAFFIC_TAG. The tag name of the local deployment environment is "ASM_TRAFFIC_TAG". The actual deployment can be associated with the CI/CD system. Storage logic: attachTo specifies the corresponding field stored in the protocol context, such as the Header field for HTTP and the rpc context part for Dubbo. The specific field to be stored is configurable. With the definition of TrafficLabel, we know how to label and transfer traffic, but this alone is not enough to achieve full-link grayscale. We also need a function that can perform routing based on trafficLabel traffic identifiers, that is, "routing by label", as well as routing fallback and other logic, so that when the destination of the route does not exist, the degradation function can be implemented. Routing by traffic label The implementation of this feature extends Istio's VirtualService and DestinationRule. Defining Subsets in DestinationRule The custom group subset corresponds to the value of trafficLabel
Subset supports two specification forms: Labels are used to match nodes (endpoints) with specific tags in the application; Based on subset in VirtualService 1) Global default configuration The route part can specify multiple destinations in order, and the traffic is distributed among the multiple destinations according to the proportion of the weight value. The global default mode corresponds to the swimlane, which is closed in a single environment, and specifies the fallback strategy at the environment level. The custom group subset corresponds to the value of trafficLabel The configuration sample is as follows:
2) Personal development environment customization First, attack the daily environment. When there are no service resources in the daily environment, attack the main environment.
3) Support weight configuration For traffic marked with the backbone environment and whose local environment is dev-x, 80% is sent to the backbone environment and 20% is sent to the daily environment. When there are no available service resources in the backbone environment, the traffic is sent to the daily environment. sourceLabels is the label corresponding to the local workload
Routing by (environment) label This solution relies on the business deploying applications with related labels (the corresponding label in the example is ASM_TRAFFIC_TAG: xxx), which are usually environment labels. The labels can be understood as meta information related to service deployment. This relies on the connection of the upstream deployment system CI/CD system. The schematic diagram is as follows: In the K8s scenario, the corresponding environment/group label can be automatically added during business deployment, that is, K8s itself is used as the metadata management center. Note: ASM Pro has developed its own ServiceDirectory component (see the ASM Pro product functional architecture diagram), which enables the connection of multiple registration centers and the dynamic acquisition of deployment metadata; Application scenario extension The following is a typical example of a multi-development environment governance function based on traffic labeling and label-based routing. Each developer's corresponding Dev X environment only needs to deploy services with updated versions. If joint debugging with other developers is required, the service request can be transferred to the corresponding development environment by configuring fallback. As shown in the figure below, B of Dev Y environment -> C of Dev X environment. Similarly, it is possible to equate the Dev X environment with the online grayscale version environment, which can solve the problem of full-link grayscale release in the online environment. Summarize The "traffic tagging" and "routing by tag" capabilities introduced in this article are a general solution that can better solve problems such as test environment management and online full-link grayscale release. Based on service mesh technology, it is independent of the development language. At the same time, this solution is suitable for different 7-layer protocols and currently supports HTTP/gRpc and Dubbo protocols. Other vendors also have some solutions for full-link grayscale. Compared with other solutions, the advantages of ASM Pro are:
The capabilities of "traffic tagging" and "routing by tag" can also be used in other related scenarios:
|
<<: Serverless Engineering Practice | Quickly Build Kubeless Platform
>>: The Brazilian government announced plans to achieve full 5G coverage across the country by 2029
picture Speaking of "3CC", we must ment...
RackNerd has released a number of promotional pac...
HostYun launched this year's Double 11 promot...
Justhost.ru is a foreign hosting company founded ...
With the popularity of e-commerce and mobile paym...
Currently, 5G construction is in full swing aroun...
The tribe has shared information about RackNerd m...
Around mid-July, we published an article about th...
Artificial intelligence and machine learning are ...
LOCVPS is a domestic hosting company founded in 2...
Although Apple held a press conference recently, ...
[[435879]] The China CIO Alliance (CCA) was held ...
Hello everyone, I am Xiaolin. I saw an old man as...
As software-defined wide area networks (SD-WAN) h...