Linkerd provides many features, such as: automatic mTLS, automatic proxy injection, distributed tracing, fault injection, high availability, HTTP/2 and gRPC proxy, load balancing, multi-cluster communication, retries and timeouts, telemetry and monitoring, traffic splitting (canary, blue/green deployment), etc. The Linkerd 2.10 Chinese manual is being continuously revised and updated:
Linkerd 2.10 Series
Functional Overview
HTTP, HTTP/2, and gRPC proxiesLinkerd can proxy all TCP connections and will automatically enable advanced features (including metrics, load balancing, retries, etc.) for HTTP, HTTP/2, and gRPC connections.
TCP Proxy and Protocol DetectionLinkerd can proxy all TCP traffic, including TLS connections, WebSockets, and HTTP tunnels. Most of the time, Linkerd does this without configuration. To do this, Linkerd performs protocol detection to determine whether the traffic is HTTP or HTTP/2 (including gRPC). If Linkerd detects that the connection is HTTP or HTTP/2, Linkerd will automatically provide HTTP-level metrics and routing. If Linkerd cannot determine whether the connection is using HTTP or HTTP/2, Linkerd proxies the connection as a normal TCP connection, applying mTLS and providing byte-level metrics as usual. Client-initiated HTTPS will be treated as TCP, not HTTP, because Linkerd will not be able to observe HTTP transactions on the connection. Configuring protocol detectionIn some cases, Linkerd's protocol detection fails to work because it hasn't been provided enough client data. This can cause a 10 second delay in creating a connection as the protocol detection code waits for more data. This is most commonly encountered when using "server-speaks-first" protocols, or protocols where the server sends data before the client does, and can be avoided by providing Linkerd with some additional configuration. Regardless of the underlying protocol, client-initiated TLS connections do not require any additional configuration because TLS itself is a client-speaks-first protocol. There are two basic mechanisms for configuring protocol detection: opaque ports and skip ports. Marking a port as opaque instructs Linkerd to proxy the connection as a TCP stream instead of attempting protocol detection. Marking a port as skip bypasses the proxy entirely. Opaque ports are generally preferred (because Linkerd can provide mTLS, TCP-level metrics, etc.), but it is critical that opaque ports are only used for services internal to the cluster. By default, Linkerd automatically marks some ports as opaque, including the default ports for SMTP, MySQL, PostgresQL, and Memcache. Services that use these protocols, use the default ports, and are internal to the cluster require no further configuration. The following table summarizes some common server-speaks-first protocols and the configuration required to handle them. The "on-cluster config" column refers to the configuration when the destination is in the same cluster; the "off-cluster config" column refers to the configuration when the destination is outside the cluster. * If you use a standard port, no configuration is required. If you use a non-standard port, you must mark the port as opaque. Mark a port as opaqueYou can mark a port as opaque using the config.linkerd.io/opaque-ports annotation. This instructs Linkerd to skip protocol detection for that port. This annotation can be set on a workload, service, or namespace. Setting it on a workload tells meshed clients of that workload to skip protocol detection for connections established to the workload, and tells Linkerd to skip protocol detection when reverse-proxying incoming connections. Setting it on a service tells meshed clients to skip protocol detection when proxying connections to the service. Setting it on a namespace applies this behavior to all services and workloads in that namespace. Since this annotation informs meshed clients about behavior, it can be applied to services that use a server-speaks-first protocol even if the service itself is not meshed. You can set the opaque-ports annotation by using the --opaque-ports flag when running linkerd inject. For example, to use the non-standard port 4406 for a MySQL database running on your cluster, you could use the following command:
Multiple ports may be provided as a comma-separated string. The values you provide will replace, rather than augment, the default list of opaque ports. Skip ProxySometimes it is desirable to bypass the proxy entirely. For example, when connecting to a server-speaks-first destination outside the cluster, there is no Service resource that can set the config.linkerd.io/opaque-ports annotation. In this case, you can use the --skip-outbound-ports flag when running linkerd inject to configure the resource to bypass the proxy entirely when sending to those ports. (Similarly, the --skip-inbound-ports flag will configure the resource to bypass the proxy for incoming connections to those ports.) Skipping the proxy is useful for these situations and for diagnosing problems, but is rarely necessary beyond that. As with opaque ports, multiple skip ports may be provided as a comma-delimited string. Retries and timeoutsAutomatic retries are one of the most powerful and useful mechanisms a service mesh has for gracefully handling partial or transient application failures. If implemented incorrectly, retries can amplify small errors into system-wide outages. For this reason, we ensure that they are implemented in a way that improves the reliability of the system while limiting risk. Timeouts are closely related to retries. Once a request has been retried a certain number of times, it becomes important to limit the total time the client has to wait before giving up completely. Imagine multiple retries forcing the client to wait 10 seconds. Service configuration files can define certain routes as retryable or specify route timeouts. This will cause the Linkerd proxy to perform appropriate retries or timeouts when calling that service. Retries and timeouts are always performed on the outbound (client) side. If you are using headless services, you cannot retrieve service profiles. Linkerd reads service discovery information based on the target IP address, and if this happens to be the pod IP address, it cannot tell which service the pod belongs to. How retries go wrongTraditionally, when implementing retries, you had to specify a maximum number of retries before giving up. Unfortunately, there are two major problems with configuring retries in this way. Choosing the maximum number of retries is a guessing game You need to choose a number high enough to have an impact; allowing many retries is generally prudent, and if your service is not very reliable, you might want to allow many retries. On the other hand, allowing too many retry attempts can generate a lot of extra requests and extra load on the system. Doing a lot of retries can also seriously increase the latency of requests that need to be retried. In practice, you usually pick a maximum number of retries out of a hat (3?), and then tune it through trial and error until the system behaves roughly the way you want it to. Systems configured in this way are vulnerable to retry storms. A retry storm starts when a service starts (for any reason) and encounters a higher than normal failure rate. This causes its clients to retry those requests that fail. The additional load from the retries causes the service to slow down further and cause more requests to fail, which triggers more retries. If each client is configured to retry at most 3 times, the number of requests sent could increase by four times! Worse, if any of the clients have retries configured, the number of retries increases exponentially and can turn a small number of errors into a self-inflicted denial of service attack. Retry budget to the rescueTo avoid problems with retry storms and arbitrary numbers of retries, retries are configured using retry budgets. Instead of specifying a fixed maximum number of retries for each request, Linkerd tracks the ratio between regular requests and retries and keeps this number below a configurable limit. For example, you can specify that you want to retry at most 20% more requests. Linkerd will retry as many times as possible while maintaining that ratio. Configuring retries is always a trade-off between increasing the success rate and not adding too much additional load to the system. Retry budgets make that trade-off explicit by letting you specify how much additional load the system is willing to accept from retries. Automatic mTLSBy default, Linkerd automatically enables mutual Transport Layer Security (mTLS) for most TCP traffic between mesh Pods by establishing and verifying secure, private TLS connections between Linkerd proxies. This means Linkerd can add authenticated, encrypted communication to your applications with very little work on your part. Since the Linkerd control plane also runs on the data plane, this means that communication between Linkerd control plane components is also automatically secured via mTLS. How does it work?In short, Linkerd's control plane issues TLS certificates to proxies, which are scoped to the Kubernetes ServiceAccount containing the Pod and automatically rotated every 24 hours. The proxies use these certificates to encrypt and authenticate TCP traffic to other proxies. To do this, Linkerd maintains a set of credentials in the cluster: a trust anchor, an issuer certificate, and a private key. These credentials are generated by Linkerd itself at installation time, or by an external source such as Vault or cert-manager. The issuer certificate and private key are placed in a Kubernetes Secret. By default, the Secret is placed in the linkerd namespace and can only be read by the service account used by the identity component of the Linkerd control plane. On the data plane side, each proxy passes a trust anchor in an environment variable. On startup, the proxy generates a private key, stored in a tmpfs emptyDir, that stays in memory and never leaves the pod. The proxy connects to the identity component of the control plane, verifies the identity connection with the trust anchor, and issues a certificate signing request (CSR). The CSR contains an initial certificate with the identity set to the pod's Kubernetes ServiceAccount, as well as the actual service account token so that the identity can verify that the CSR is valid. After verification, the signed trust package is returned to the proxy, which can use it as client and server certificates. These certificates have a scope of 24 hours and are dynamically refreshed using the same mechanism. When a proxy injected into a Pod receives an outbound connection from an application container, it performs service discovery by looking up that IP address with the Linkerd control plane. When the destination is in a Kubernetes cluster, the control plane provides the proxy with the destination’s endpoint address along with metadata. When an identity name is included in this metadata, this indicates to the proxy that it can initiate mutual TLS. When the proxy connects to the destination, it initiates a TLS handshake, verifying that the destination proxy’s certificate is signed for the expected identity name. maintainThe trust anchors generated by linkerd install expire after 365 days and must be rotated manually. Alternatively, you can provide your own trust anchors and control the expiration date. By default, issuer certificates and keys are not rotated automatically. You can set up automatic rotation using cert-manager. Notes and Future WorkThere are some known gaps in Linkeder's ability to automatically encrypt and authenticate all communications within a cluster. These gaps will be fixed in future releases:
Linkerd does not automatically enforce mTLS for any requests from inside or outside the mesh. This will be addressed in a future Linkerd version, likely as an opt-in behavior, as it may break some existing applications.
Shared with other potential uses of the token. In a future Kubernetes version, Kubernetes will support audience/time-bound ServiceAccount tokens, and Linkerd will use these tokens. IngressFor simplicity, Linkerd does not provide its own ingress controller. Instead, Linkerd is designed to work with the ingress controller of your choice. Telemetry and MonitoringOne of Linkerd's most powerful features is its extensive set of tools around observability — measuring and reporting observed behavior in your mesh applications. While Linkerd doesn't have direct insight into the internals of your service code, it has tremendous insight into the external behavior of your service code. To access Linkerd's observability features, you just need to install the Viz extension:
Linkerd's telemetry and monitoring features work automatically, without any work from the developer. These features include:
This data can be used in a variety of ways:
For example: use linkerd viz stat and linkerd viz routes.
Golden IndicatorSuccess rate This is the percentage of successful requests within a time window (default is 1 minute). In the output of linkerd viz routes -o wide, this metric is split into EFFECTIVE_SUCCESS and ACTUAL_SUCCESS. For routes with retries configured, the former calculates the success percentage after retries (client-perceived), while the latter calculates the success percentage before retries (which may reveal potential problems with the service). Traffic (number of requests per second) This gives an overview of the demand for the service/route. As with the success rate, linkerd viz routes --o wide splits this metric into EFFECTIVE_RPS and ACTUAL_RPS, corresponding to the ratio before and after retries, respectively. Delay The time spent serving requests for each service/route is broken down into 50th, 95th, and 99th percentiles. The lower percentiles give you a rough idea of the average performance of the system, while the tail percentiles help catch anomalous behavior. Linkerd Metrics LifecycleLinkerd is not designed as a long-term historical metrics store. While the Viz extension for Linkerd does include a Prometheus instance, that instance expires metrics at a short fixed interval (currently 6 hours). Instead, Linkerd is designed to supplement your existing metrics store. If Linkerd's metrics are valuable, you should export them to your existing historical metrics store. Load BalancingFor HTTP, HTTP/2, and gRPC connections, Linkerd automatically load balances requests across all target endpoints, without any configuration required. (For TCP connections, Linkerd balances the connection.) Linkerd uses an algorithm called EWMA, or exponentially weighted moving average, to automatically send requests to the fastest endpoint. This load balancing can improve end-to-end latency. Service DiscoveryFor destinations not in Kubernetes, Linkerd will balance between endpoints provided by DNS. For destinations in Kubernetes, Linkerd will look up the IP address in the Kubernetes API. If the IP address corresponds to a service, Linkerd will load balance across the endpoints of that service and apply any policies in the service's service profile. On the other hand, if the IP address corresponds to a Pod, Linkerd will not perform any load balancing or apply any service profiles. If you use headless services, there is no way to retrieve the endpoint of the service. Therefore, Linkerd does not perform load balancing, but only routes to the target IP address. Load balancing gRPCLinkerd's load balancing is particularly useful for gRPC (or HTTP/2) services in Kubernetes, for which Kubernetes' default load balancing is ineffective. Automatic proxy injectionWhen the linkerd.io/inject: enabled annotation is present on a namespace or any workload (such as a deployment or pod), Linkerd automatically adds a data plane proxy to the pod. This is called "proxy injection". Proxy injection is also where proxy configuration occurs. Although rarely needed, you can configure proxy settings by setting additional Kubernetes annotations at the resource level before injection. detailProxy injection is implemented as a Kubernetes admission webhook. This means that the proxy is added to pods within the Kubernetes cluster itself, regardless of whether the pods were created by kubectl, a CI/CD system, or any other system. For each pod, two containers are injected:
Note that simply adding the annotation to a resource with pre-existing pods will not automatically inject those pods. You will need to update the pods (e.g. using kubectl rollout restart etc.) in order for them to be injected. This is because Kubernetes will not call the webhook until the underlying resource needs to be updated. Override InjectionAutomatic injection can be disabled for a pod or deployment that would otherwise be enabled by adding the linkerd.io/inject: disabled annotation. Manual injectionThe linkerd inject CLI command is a text transform that, by default, simply adds the inject annotation to a given Kubernetes manifest. Alternatively, this command can also be used with the --manual flag to perform full injection on the client side. This was the default behavior prior to Linked 2.4; however, injecting data cluster-side makes it easier to ensure that the data plane is always present and properly configured, regardless of how the pods are deployed. Container Network Interface PluginLinkerd installs can be configured to run a CNI plugin that automatically rewrites iptables rules for each pod. Routing network traffic through a pod’s linkerd-proxy container requires rewriting iptables. With the CNI plugin enabled, individual pods no longer need to include an init container that requires the NET_ADMIN capability to perform the rewrite. This is useful in clusters where the cluster administrator has restricted this capability. InstallTo use the Linkerd CNI plugin, you must first successfully install the linkerd-cni DaemonSet on the cluster, and then install the Linkerd control plane. Using CLITo install the linkerd-cni DaemonSet, run:
Once the DaemonSet is up and running, all subsequent installations that include the linkerd-proxy container (including the Linkerd control plane) no longer need to include the linkerd-init container. The omission of the init container is controlled by the --linkerd-cni-enabled flag when the control plane is installed. Install the Linkerd control plane using:
This will set the cniEnabled flag in the linkerd-config ConfigMap. All subsequent proxy injections will read this field and omit the init container. Using Helm First make sure your local cache of Helm is updated:
Run the following command to install CNI DaemonSet:
For Helm versions lower than v3, you must specifically pass the --name flag. In Helm v3, it has been deprecated and is the first argument specified above. At this point, you're ready to install Linkerd with CNI enabled. You can do this by following Install Linkerd with Helm. Additional Configuration The linkerd install-cni command includes additional flags that can be used to customize the installation. For more information, see linkerd install-cni --help. Note that many of the flags are similar to the flags that you can use to configure the proxy when running linkerd inject. If you change the defaults when running linkerd install-cni, you will need to make sure to make corresponding changes when running linkerd inject. The most important signs are:
Upgrading the CNI pluginBecause the CNI plugin is essentially stateless, a separate upgrade command is not required. If you upgrade the CNI plugin using the CLI, you can do the following:
Keep in mind that if you are upgrading the plugin from an experimental version, you will need to uninstall and reinstall. Dashboards and GrafanaIn addition to the command line interface, Linkerd also provides a web dashboard and a preconfigured Grafana dashboard. To access this feature you need to install the Viz extension:
Linkerd DashboardThe Linkerd dashboard provides a high-level view of what's happening with your services in real time. It can be used to view "golden" metrics (success rate, requests/second, and latency), visualize service dependencies, and understand the health of specific service routes. One way to pull it up is to run linkerd viz dashboard from the command line. GrafanaAs a component of the control plane, Grafana provides out-of-the-box actionable dashboards for your services. You can view high-level metrics and drill down into details, even down to the pod. Out-of-the-box dashboards include: Top Line Metrics Deployment Details Pod Details Linkerd Health Distributed tracingTracing can be an invaluable tool for debugging the performance of distributed systems, especially for identifying bottlenecks and understanding the latency cost of each component in the system. Linkerd can be configured to emit trace spans from the proxy, allowing you to see exactly how long requests and responses are taking internally. Unlike most Linkerd features, distributed tracing requires code and configuration changes. Additionally, Linkerd provides many features commonly associated with distributed tracing without requiring configuration or application changes, including:
For example, Linkerd can display the real-time topology of all incoming and outgoing dependencies of a service, without the need for distributed tracing or any other such application modifications: Linkerd dashboard showing automatically generated topology graph Likewise, Linkerd can provide golden metrics for each service and each route, again without the need for distributed tracing or any other such application modifications: Linkerd dashboard showing automatically generated routing metrics Using distributed tracing That said, distributed tracing certainly has its uses, and Linkerd makes this as easy as possible. Linkerd's role in distributed tracing is actually very simple: when a Linkerd data plane proxy sees a tracing header in a proxied HTTP request, Linkerd will emit a trace span for that request. This span will include information about the exact amount of time spent in the Linkerd proxy. When used in conjunction with software to collect, store, and analyze this information, this can provide important insight into the behavior of the mesh. To use this feature, you need to introduce several additional components into the system, including an ingress layer that initiates tracing of specific requests, a client library for your application (or a mechanism to propagate tracing headers), a trace collector that collects span data and converts it into traces, and a trace backend that stores the trace data and allows users to view/query it. Fault InjectionFault injection is a form of chaos engineering that involves artificially increasing the error rate of a service to observe the impact on the entire system. Traditionally, this requires modifying the service's code to add a fault injection library that does the actual work. Linker can do this without any service code changes and only requires a little configuration. High AvailabilityFor production workloads, Linkerd's control plane can be run in high availability (HA) mode. This mode:
To ensure that, where possible, they are scheduled on separate nodes and in separate zones by default. Enable HAYou can enable HA mode during control plane installation using the --ha flag:
Note also that the visualization extension also supports a --ha flag with similar characteristics:
You can override certain aspects of HA behavior at installation time by passing additional flags to the install command. For example, you can override the number of replicas for key components using the --controller-replicas flag:
See the complete install CLI documentation for reference. The linkerd upgrade command can be used to enable HA mode on an existing control plane:
Proxy Injector Failure StrategyThe HA proxy injector deploys a stricter failure policy to enforce automatic proxy injection. This setting ensures that annotated workloads are not accidentally scheduled to run on your cluster without the Linkerd proxy. (This can happen when the proxy injector is turned off.) If the proxy injection process fails during the admission phase due to unrecognized or timeout errors, the workload admission will be rejected by the Kubernetes API server and the deployment will fail. Therefore, it is very important to always have at least one healthy copy of the proxy injector running on the cluster. If you cannot guarantee the number of healthy proxy injectors on your cluster, you can relax the webhook failure policy by setting its value to Ignore, as shown in the Linkerd Helm chart. For more information on admission webhook failure policies, see the Kubernetes documentation. Exclude the kube-system namespace As recommended by the Kubernetes documentation, the proxy injector should be disabled for the kube-system namespace. This can be done by tagging the kube-system namespace with the following label:
The Kubernetes API server does not call the proxy injector during the admission phase of workloads in namespaces with this label. Pod anti-affinity rulesAll key control plane components are deployed with pod anti-affinity rules to ensure redundancy. Linkerd uses the requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity rule to ensure that the Kubernetes scheduler does not collocate replicas of critical components on the same node. A preferredDuringSchedulingIgnoredDuringExecution pod anti-affinity rule has also been added to try to schedule replicas in different zones when possible. To satisfy these anti-affinity rules, HA mode assumes that there are always at least three nodes in the Kubernetes cluster. If this assumption is violated (for example, the cluster shrinks to two or fewer nodes), the system may be in a non-functional state. Note that these anti-affinity rules do not apply to add-ons such as Prometheus and Grafana. Scaling PrometheusThe Linkerd Viz extension provides a preconfigured Prometheus pod, but for production workloads we recommend setting up your own Prometheus instance. When planning the amount of memory needed to store Linkerd time series data, a general guideline is 5MB per mesh pod. If your Prometheus is experiencing regular OOMKilled events due to the volume of data coming from the data plane, two key parameters that can be tuned are:
Using Cluster AutoScalerThe Linkerd proxy stores its mTLS private key in a tmpfs emptyDir volume to ensure that this information never leaves the pod. This causes the default settings of the Cluster AutoScaler to fail to scale down nodes with replicas of the injected workload. The solution is to use the cluster-autoscaler.kubernetes.io/safe-to-evict: "true" annotation to inject workloads. If you have full control over the Cluster AutoScaler configuration, you can start Cluster AutoScaler with the --skip-nodes-with-local-storage=false option. Multi-cluster communicationLinkerd can connect Kubernetes services across cluster boundaries in a way that is secure, completely transparent to applications, and independent of the network topology. This multi-cluster capability is designed to provide:
Just like intra-cluster connections, Linkerd's cross-cluster connections are transparent to application code. Whether communication occurs within a cluster, between clusters within a datacenter or VPC, or over the public internet, Linkerd establishes a connection between clusters that is encrypted and authenticated on both sides using mTLS. How it worksLinkerd's multi-cluster support works by "mirroring" service information between clusters. Since remote services are represented as Kubernetes services, Linkerd's full observability, security, and routing capabilities apply uniformly to in-cluster and out-of-cluster calls, and applications do not need to distinguish between these cases. Linkerd's multicluster functionality is implemented by two components: a service mirror and a gateway. The service mirror component monitors service updates in the target cluster and mirrors those service updates locally on the source cluster. This provides visibility into the service names of the target cluster so that applications can address them directly. The multicluster gateway component provides a way for the target cluster to receive requests from the source cluster. (This allows Linkerd to support hierarchical networking) After installing these components, you can export Kubernetes Service resources that match the label selector to other clusters. Service ProfileA service profile is a custom Kubernetes resource (CRD) that provides additional information about Linkerd for a service. In particular, it allows you to define a list of routes for a service. Each route uses a regular expression to define which paths should match that route. Defining a service profile enables Linkerd to report metrics for each route, and also allows you to enable features for each route, such as retry and timeout. If you use a headless service, you cannot retrieve the service configuration file. Linkerd reads service discovery information based on the target IP address, and if this happens to be the pod IP address, it cannot tell which service the pod belongs to. Traffic split (canary, blue/green deployment)Linkerd's traffic splitting feature allows you to dynamically transfer any portion of traffic that is a Kubernetes service to different destination services. This feature can be used to implement complex deployment strategies such as canary deployment and blue/green deployment, for example, by slowly transferring traffic from older versions of services to new versions. If you use a headless service, traffic split cannot be retrieved. Linkerd reads service discovery information based on the target IP address, and if this happens to be the pod IP address, it cannot tell which service the pod belongs to. Linkerd exposes this feature through the Service Mesh Interface (SMI) TrafficSplit API. To use this feature, you need to create a Kubernetes resource as described in the TrafficSplit specification, and Linkerd takes care of the rest. By combining traffic split with Linkerd's metrics, more powerful deployment techniques can be achieved, automatically taking into account success rates and latency for both new and old versions. For an example of this example, see the Flagger project. |
<<: 129 apps were notified for illegally collecting and using personal information
The results of the bidding for 5G wireless main e...
After successively losing important markets such ...
Recently, Ookla, a network connection speed testi...
Under the wave of digital transformation, enterpr...
NexusBytes describes itself as a one-person compa...
According to Huobi Blockchain Research Center, pe...
The evolution of mobile communication networks is...
According to the data of "Economic Operation...
With over 1.1 billion users, WeChat is China’s la...
SmartHost has posted a message on its website say...
Today I will talk to you about the state analysis...
A few days ago, I received a request for help fro...
[[426618]] On the 29th, the Semiconductor Industr...
The Shodan search engine allows users to find spe...
Huawei Cloud launched a new promotion at the begi...