What is Service Mesh? Since 2010, SOA architecture has become popular among medium and large Internet companies, and Alibaba also open-sourced Dubbo in 2012. After that, microservice architecture became popular, and a large number of Internet and traditional companies devoted themselves to the construction of microservices. Dubbo and Spring Cloud, two major microservice camps, gradually formed in China. In 2016, a more cutting-edge microservice solution that is more in line with containers and Kubernetes was being nurtured in the microservice field. This technology is called Service Mesh. Today, the concept of Service Mesh has been widely popularized, and many companies have landed in the field of Service Mesh. Service Mesh Definition Service Mesh is an infrastructure layer that focuses on inter-service communication. The service topology of current cloud-native applications is very complex, and Service Mesh can achieve reliable request transmission in this complex topology. Service Mesh runs in a sidecar manner, and an independent Service Mesh process runs next to the application. Service Mesh is responsible for the communication of remote services. Military three-wheeled motorcycles are very similar to Service Mesh. On a military three-wheeled motorcycle, one soldier is responsible for driving and the other soldier is responsible for shooting people. Pain points solved by Service Mesh Most traditional microservice architectures are based on the RPC communication framework, which provides service registration/discovery, service routing, load balancing, and full-link tracking capabilities in the RPC SDK. The application business logic and the RPC SDK are in the same process, which brings many challenges to the traditional microservice architecture: the middleware capability-related code invades the business code, and the coupling is very high; the cost of promoting the upgrade of the RPC SDK is very high, which in turn leads to a very serious differentiation of SDK versions. At the same time, this method has relatively high requirements for application developers, who need to have rich service governance operation and maintenance capabilities and background knowledge of middleware, and the threshold for using middleware is relatively high. By sinking some RPC capabilities through Service Mesh, we can achieve separation of concerns and clear boundaries of responsibilities. With the development of container and Kubernetes technology, Service Mesh has become a cloud-native infrastructure. Introduction to Istio In the field of Service Mesh, Istio is undoubtedly the king. Istio consists of a control plane and a data plane. In Service Mesh, different Services communicate through Proxy Sidecar. The core function of Istio is traffic management, which is coordinated by the data plane and the control plane. Istio was initiated by Google, IBM, and Lyft. It is the purest bloodline in the field of Service Mesh in the CNCF ecosystem and is expected to become the de facto standard for Service Mesh. By default, Istio's data plane uses Envoy, which is the default and best data plane in the community. The interaction protocol between the Istio data plane and the control plane is xDS. Summary of Service Mesh Finally, let’s summarize Service Mesh: Service Mesh is positioned to provide infrastructure for communication between services, and the community mainly supports RPC and http. It is deployed in Sidecar mode and supports deployment on Kubernetes and virtual machines. Service Mesh uses the original protocol to forward, so Service Mesh is also called a network proxy. It is precisely because of this method that it can achieve zero intrusion on the application. What is Dapr? Challenges of Service Mesh The forms of business deployment by users on the cloud are mainly general application types and FaaS types. In the FaaS scenario, cost and R&D efficiency are more attractive to users. Cost is mainly achieved through on-demand allocation and extreme elasticity. Application developers expect to improve R&D efficiency, including startup time, release time, and development efficiency, by providing a multi-language programming environment through FaaS. The essence of Service Mesh implementation is original protocol forwarding, which can bring zero-intrusion advantages to applications. However, original protocol forwarding also brings some problems. The application-side middleware SDK also needs to implement serialization and encoding and decoding, so there is a certain cost in multi-language implementation. With the continuous development of open source technology, the technology used is also constantly iterating. If you want to migrate from Spring Cloud to Dubbo, either the application developer needs to switch the dependent SDK, or if you want to use Service Mesh to achieve this effect, Service Mesh needs to perform protocol conversion, which is costly. Service Mesh focuses more on communication between services, and has very little support for other forms of Mesh. For example, Envoy, except for its success in the RPC field, has not been fruitful in attempts in the fields of Redis and messaging. Ant's Mosn supports the integration of RPC and messaging. The demand for overall multi-Mesh forms exists, but each Mesh product develops independently, lacking abstraction and standards. Do such multi-form Mesh share the same process? If they share the same process, do they share the same port? Many questions have no answers. As for the control plane, from a functional perspective, most of them revolve around traffic. After reading the content of the xDS protocol, the core is around service discovery and routing. Other types of distributed capabilities are basically not involved in the control plane of Service Mesh, let alone abstracting various protocols similar to xDS to support these distributed capabilities. Due to reasons such as cost and R&D efficiency, FaaS has been chosen by more and more customers. FaaS has more demands on multi-language and programming API friendliness, so Service Mesh still cannot bring additional value to customers in these two areas. Requirements for distributed applications Bilgin Ibryam is the author of Kubernetes Patterns, the chief middleware architect of RedHat, and very active in the Apache community. He published an article that abstracted some of the current difficulties and problems of distribution, and divided the requirements of distributed applications into four major categories: life cycle, network, state, and binding. There are also some sub-capabilities under each type, such as Point-to-Point, pub/sub, Caching and other classic middleware capabilities. Applications have so many requirements for distributed capabilities, and Service Mesh obviously cannot meet the current needs of applications. Biligin Ibryam also proposed the concept of Multiple Runtime in the article to solve the dilemma of Service Mesh. Derivation of the Multiple Runtime Concept In the traditional middleware model, applications and distributed capabilities are integrated in a process in the form of SDK. As various infrastructures are deployed, various distributed capabilities are moved from applications to outside the application. For example, K8s is responsible for lifecycle-related requirements, and Istio, Knative, etc. are responsible for some distributed capabilities. If all these capabilities are moved to independent runtimes, this situation is unacceptable both from the operation and maintenance level and the resource level. Therefore, it is necessary to integrate some runtimes at this time, and the most ideal way is to integrate them into one. This method is defined as Mecha, which means mecha in Chinese. Just like the protagonist in Japanese anime transforms into a mecha, each part of the mecha is like a distributed capability, and the person in the mecha corresponds to the main application, also called Micrologic Runtime. These two runtimes can be a one-to-one Sidecar mode, which is very suitable for traditional applications; or a many-to-one Node mode, which is suitable for edge scenarios or network management mode. So the goal of Mecha Runtime to integrate various distributed capabilities is not a big problem in itself, but how to integrate them? What are the requirements for Mecha? Mecha's component capabilities are abstract, and any open source product can be quickly expanded and integrated. Introduction to Dapr The previous introduction to Multiple Runtime is rather abstract. Let's re-understand Multiple Runtime from Dapr. Dapr is a good practitioner of Multiple Runtime, so Dapr must coexist with applications, either in Sidecar mode or Node mode. The word Dapr is not made up, but is composed of the first letters of Distributed Application Runtime. The Dapr icon can be seen as a hat, which is actually a waiter's hat, which means to provide good services for applications. Dapr is open sourced by Microsoft, and Alibaba is deeply involved in the cooperation. The current Dapr version 1.1 has been released and is now close to production capabilities. Since Dapr is the best practitioner of Multiple, the operating mechanism of Dapr is also built based on the concept of Multiple Runtime. Dapr abstracts distributed capabilities and defines a set of distributed capability APIs, and these APIs are built based on Http and gRPC. This abstraction and capability is called Building Block in Dapr. In order to support different types of products such as open source products and commercial products to expand the distributed capabilities in Dapr, Dapr has an internal SPI extension mechanism called Components. After using Dapr, application developers only need to program for various distributed capability APIs without paying too much attention to the specific implementation. In Dapr, corresponding components can be freely activated according to Yaml files. Dapr Features Application developers can directly use various distributed capabilities by using Dapr SDK in various languages. Of course, developers can also complete the call based on HTTP and gRPC. Dapr can run in most environments, including your own computer environment, or any Kubernetes environment, or edge computing scenarios, or cloud vendors such as Alibaba Cloud, AWS, and GCP. The Dapr community has integrated more than 70 components implementations, which application developers can quickly select and use. Components with similar capabilities can be replaced through Dapr without the application side being aware of it. Dapr core module Let's analyze it from the perspective of Dapr product modules to see why Dapr is a good practice for Multitiple Runtime. The Component mechanism ensures the ability to quickly expand. The community now has more than 70 Components implementations, including not only open source products but also commercial products on the cloud. The distributed capabilities represented by Building Block currently only support 7, and more distributed capabilities will be needed in the future. BuildingBlock now supports HTTP and gRPC, two open protocols that are already very popular. The specific components under the Building Block in Dapr will be activated, which depends on the YAML file. Because Dapr uses HTTP and gRPC to expose capabilities, it becomes easier to support multi-language standard API programming interfaces on the application side. Dapr Core: Component & Building Block Dapr Component is the core of Dapr plug-in extension and the SPI of Dapr. Currently supported Components include Bindings, Pub/Sub, Middleware, ServiceDiscovery, Secret Stores, and State. Some of the extension points are functional, such as Bindings, pub/sub, state, etc., and some are horizontal, such as Middleware. Suppose you want to implement Dapr integration of Redis, you only need to implement Dapr's State Component. Dapr Building Block is a capability provided by Dapr, supporting gRPC and HTTP methods. Currently supported capabilities include Service Invocation, State, Pub/Sub, etc. A Building Block consists of one or more Components. The Binding Building Block contains two Components: Bindings and Middleware. Dapr overall architecture Like Istio, Dapr also has a data plane and a control plane. The control plane includes Actor Placement, Sidecar Injector, Sentry, and OPerator. Actor Placement mainly serves Actor, Sentry does security and certificate-related work, and Sidecar Injector is mainly responsible for the injection of Dapr Sidecar. In Dapr, the activation of a component implementation is completed through a YAML file. The YAML file can be specified in two ways: one is to specify the runtime parameters locally, and the other is to complete it through the control plane Operator. The component activation file is stored in the form of K8s CRD and sent to Dapr's Sidecar. The two core components of the control plane rely on K8s to run. The current Dapr Dashboard function is still very weak, and it is not in the direction of enhancement in the short term. After the integration of various components, the operation and maintenance of each component still needs to be completed in the original console. The Dapr control plane does not participate in the operation and maintenance of specific component implementations. The standard running mode of Dapr is to run it in the same Pod as the application, but in two separate containers. The rest of Dapr has been introduced sufficiently before, so I will not introduce it here. Dapr Microsoft landing scene Dapr has been developing for about 2 years. How is it being implemented within Microsoft? There are two projects on Dapr's github: workflows and Azure Functions Dapr extensions. Azure Logic App is a cloud-based automatic workflow platform from Microsoft. Workflows integrates Azure Logic App and Dapr. There are several key concepts in Azure Logic App, and Trigger and Connector are very compatible with Dapr. Trigger can be completed using Dapr's Input Binding, which relies on a large number of components implemented by Dapr's Input Binding to expand the types of traffic entrances. The Connector and Dapr's Output Binding or Service Invocation capabilities are very compatible, and external resources can be accessed quickly. Azure Functions Dapr extensions are based on Azure Function extension's Dapr support, which allows Azure Function to quickly use the capabilities of Dapr's various Building Blocks, while providing function developers with a relatively simple and consistent programming experience in multiple languages. Azure API Management Service is not consistent with the two scenarios mentioned above. It is based on the premise that applications have accessed each other through Dapr Sidecar, and the services provided by the applications are exposed through Dapr. At this time, if non-K8s applications or cross-cluster applications want to access the services of the current cluster, a gateway is required. This gateway can directly expose the capabilities of Dapr, and some security and permission controls will be added to the gateway. Currently, three Building Blocks are supported: Service Invocation, pub/sub, and resource Bindings. Dapr Summary The capability-oriented APIs provided by Dapr can provide developers with a consistent programming experience that supports multiple languages, and the SDKs of these APIs are relatively lightweight. These features are very suitable for FaaS scenarios. As the Dapr integrated ecosystem continues to improve, the advantages of developer capability-oriented programming will be further expanded. Through Dapr, it is more convenient to replace the implementation of Dapr components without developers having to adjust the code. Of course, the original component and the new component implementation must be the same type of distributed capabilities. Differences from Service Mesh: Capabilities provided: Service Mesh focuses on service calls; Dapr provides a wider range of distributed capabilities, covering a variety of distributed primitives. Working principle: Service Mesh uses the original protocol forwarding to achieve zero intrusion; Dapr uses multi-language SDK + standard API + various distributed capabilities. Domain-oriented: Service Mesh is very friendly to the non-invasive upgrade support of traditional microservices; Dapr provides a more friendly programming experience for application-oriented developers. Alibaba's exploration on Dapr Alibaba’s development path in Dapr In October 2019, Microsoft open-sourced Dapr and released version 0.1.0. At this time, Alibaba and Microsoft had just started some cooperation on OAM and learned about the Dapr project, so they began to evaluate it. In early 2020, Alibaba and Microsoft had a round of communication on Dapr offline at Alibaba, and learned about Microsoft's views on Dapr, investment, and subsequent development plans. At this time, Alibaba has determined that the Dapr project has great value. It was not until mid-2020 that work began around Dapr. By October, Dapr began to grayscale some functions online in the function computing scenario. To date, the grayscale of all functions of Dapr related to function computing has been basically completed and has begun to be open for public testing. By February 2021, version 1.0 was finally released. Alibaba Cloud Function Compute integrates Dapr In addition to the benefits of extreme elasticity and other O&M side benefits, the difference between function computing and middle-end applications is that function computing focuses more on providing developers with a better R&D experience and improving overall R&D efficiency. The value that Dapr can bring to function computing is to provide a unified capability-oriented programming interface in multiple languages, and developers do not need to focus on specific products. For example, if you want to use OSS services on Alibaba Cloud in Java, you need to introduce Maven dependencies and write some OSS code. With Dapr, you only need to call the Binding method of the Dapr SDK to do it. While facilitating programming, the entire executable package does not need to introduce redundant dependency packages, but is controllable. Function Compute is called Function Compute in English, or FC for short. The architecture of FC includes many systems, and those related to developers mainly include Function Compute Gateway and the environment in which functions run. FC Gateway is mainly responsible for accepting traffic, and will also scale up or down the current function instance based on the size of the traffic it accepts, the current CPU and memory usage. The runtime environment of Function Compute is deployed in a Pod, with the function instance in the main container and dapr in the sidecar container. When external traffic accesses the service of Function Compute, the traffic will first go to Gateway, which will forward the traffic to the function instance that provides the current service based on the accessed content. After receiving the request, if the function instance needs to access external resources, it can initiate a call through Dapr's multi-language SDK. At this time, the SDK will initiate a gRPC request to the Dapr instance, and the dapr instance will select the corresponding capabilities and component implementations based on the type and body of the request, and then initiate a call to the external resource. Function Compute is called Function Compute in English, or FC for short. The architecture of FC includes many systems, and those related to developers mainly include Function Compute Gateway and the environment in which functions run. FC Gateway is mainly responsible for accepting traffic, and will also scale up or down the current function instance based on the size of the traffic it accepts, the current CPU and memory usage. The runtime environment of Function Compute is deployed in a Pod, with the function instance in the main container and dapr in the sidecar container. When external traffic accesses the service of Function Compute, the traffic will first go to Gateway, which will forward the traffic to the function instance that provides the current service based on the accessed content. After receiving the request, if the function instance needs to access external resources, it can initiate a call through Dapr's multi-language SDK. At this time, the SDK will initiate a gRPC request to the Dapr instance, and the dapr instance will select the corresponding capabilities and component implementations based on the type and body of the request, and then initiate a call to the external resource. In the Service Mesh scenario, Mesh exists in the form of Sidecar and is deployed in two containers of the same Pod with the application, which can meet the needs of Service Mesh very well. However, in the Function Compute scenario, Dapr consumes too many resources as an independent container, and multiple function instances are deployed in one Pod to save resource expenses and second-level elasticity. Therefore, in the Function Compute scenario, the function instance and the Dapr process need to be deployed in the same container, but exist as two processes. In the function computing scenario, you can set the number of reserved instances, which indicates the minimum number of instances of the current function. If there are reserved instances, but these instances have no traffic access for a long time and need to enter the pause/hibernate state, this method is consistent with the AWS method. For a function that enters the hibernate state, the process or thread in the instance needs to stop running. An Extension structure is added to the function runtime to support the scheduling of the Dapr life cycle. When the function instance enters the hibernate state, the extension notifies Dapr to enter the hibernate state; when the function instance resumes running, the extension notifies Dapr to resume the previous running state. The component implementation inside Dapr needs to support this type of lifecycle management. Taking Dubbo as an example, Dubbo's registration center nacos needs to send heartbeats to the Nacos server regularly to keep in touch, and the Dubbo Consumer integrated by Dapr also needs to send heartbeats to the Dubbo Provider. When entering the transient state, the heartbeat needs to be exited; when resuming running, the entire running state needs to be restored. The combination of function computing and Dapr mentioned above is based on external traffic. What about inbound traffic? Can message traffic flow directly into Dapr without passing through Gateway? To achieve this, Dapr Sidecar needs to report some performance data to Gateway in a timely manner, so that Gateway can achieve resource elasticity. SaaS Services on the Cloud As more and more SaaS businesses are incubated internally by Alibaba, the demand for external services from SaaS businesses is very strong. The demand for multi-cloud deployment is very strong. Customers expect SaaS businesses to be deployed on Alibaba Cloud public cloud or Huawei private cloud. Moreover, customers expect the underlying technology to be open source or commercial products of standard cloud vendors. Take Alibaba's SaaS business as an example. The left side is the original system in Alibaba, and the right side is the transformed system. The goal of the transformation is to switch the internal system of Alibaba to open source software. Ali RPC is switched to Dubbo, and Alibaba's internal Cache, Message, and Config are switched to Redis, RocketMq, and Nacos respectively. It is expected that Dapr will achieve the lowest cost switching. Since we want to use Dapr to accomplish this mission, the simplest and most brutal method is to make the application depend on Dapr's SDK, but the cost of this transformation is too high, so we adapt the underlying implementation to Dapr SDK while keeping the original API unchanged. In this way, the application can directly access Dapr using the original API, and only needs to upgrade the corresponding dependent JAR package version. After the transformation, developers still program for the original SDK, but the underlying layer has been replaced with Dapr's capability-oriented programming, so during the migration process, the application can use a set of code without maintaining different branches for each cloud environment or different technologies. When using Dapr Sidecar within the group, rpc.yaml, cache.yaml, msg.yaml, and config.yaml will be used to activate component implementations, while dubbo.yaml, redis.yaml, rocketmq.yaml, and nacos.yaml files will be used on the public cloud to activate component implementations suitable for the Alibaba Cloud environment. This method of shielding component implementations by activating different components through different yaml files has brought great convenience to the multi-cloud deployment of SaaS services. DingTalk is an important partner and promoter of Dapr, and has worked with the cloud native team to promote the implementation of Dapr in DingTalk. By sinking some middleware capabilities to Dapr Sidecar, the underlying middleware implementation of similar capabilities is shielded. However, DingTalk still has its own business pain points. DingTalk's common business components are strongly bound to the business and require some customization for specific businesses, which also leads to low reusability. Therefore, DingTalk hopes to sink some business component capabilities to Dapr. This allows different businesses to have the same programming experience, and component maintainers only need to maintain the Components implementation. Dapr Outlook Infrastructure sinking becomes a trend in software development The development history of software architecture is very exciting. Looking back at the history of the evolution of Alibaba's system architecture, we can understand the development history of software architecture in China and even the world. When Taobao was first established, it was a single application. As the business scale grew, the system first upgraded the hardware in a scale-up manner. However, it was soon discovered that this approach encountered various problems, so in 2008, microservice solutions were introduced. SOA solutions are distributed, and for stability and observability, high-availability solutions such as circuit breakers, isolation, and full-link monitoring need to be introduced. The next problem is how to achieve an SLA of more than 99.99% availability at the computer room and IDC level. At this time, there are solutions such as dual computer rooms in the same city and multi-active in different locations. With the continuous development of cloud technology, Alibaba embraces and guides the development of cloud-native technology, actively embraces cloud-native technology, and actively upgrades cloud-native technology based on K8s. From this history, we can see that there are more and more new demands for software architecture. The original underlying infrastructure could not complete it and had to be handed over to the application side SDK to complete it. After K8s and containers gradually became standards, microservices and some distributed capabilities were returned to the infrastructure. The future trend is to sink distributed capabilities represented by Service Mesh and Dapr to release the dividends of the development of cloud and cloud native technologies. Demands of application developers in cloud-native scenarios Future application developers should expect to be able to have a capability-oriented, unrelated, and unbound development experience with specific cloud vendors and technologies, while also being able to take advantage of the extreme elasticity and cost advantages brought by cloud technology. I believe this ideal is still possible to achieve one day, but from the current perspective, how can we achieve this goal? The Multiple Runtime concept can be truly implemented and can continue to develop; |
<<: 404 Not Found? It crashed again...
>>: Technical Life Part 5-A brief discussion on how to become the number one technician?
DogYun (狗云) Classic Cloud Server has a new Hong K...
Context and Questions Modern applications usually...
In 2020, my country's large-scale 5G construc...
Virtual reality, drones, and autonomous driving, ...
[[381618]] Although personal text messaging has b...
TMThosting is a foreign hosting company establish...
When it comes to the development of WiFi, we have...
On August 8, Microsoft announced that they will d...
1. REST API principles REST (Representational Sta...
Sharktech is a long-established hosting company f...
CloudCone sent an email at the beginning of the m...
Arasaka Network LLC is a newly opened overseas VP...
A computer network is a system of interconnected ...
Security researchers from Nepal recently discover...
[[257849]] 4G LTE has been providing ultra-fast d...