Server-Speaks-First is a bit of a bummer, protocol detection and opaque ports in Linkerd 2.10

Server-Speaks-First is a bit of a bummer, protocol detection and opaque ports in Linkerd 2.10

[[416375]]

This article is reprinted from the WeChat public account "Hacker Afternoon Tea", the author is Shao. Please contact the WeChat public account "Hacker Afternoon Tea" to reprint this article.

Protocol detection, as the name implies, allows Linkerd to automatically detect the protocol used in a TCP connection. One of Linkerd’s design principles is “just work”, and protocol detection is an important part of how Linkerd achieves this goal.

What is protocol detection?

In short, protocol detection is the ability to determine the protocol used on a TCP connection by inspecting the traffic on the connection.

Linkerd uses Protocol detection to avoid requiring the user to specify the protocol. Instead of requiring the user to configure the protocol used by each port, Linkered's proxy simply performs protocol detection to answer the question.

Linkerd's protocol detection works by looking at the first few bytes of a client connection to gain information about the traffic. This implementation has some consequences, which we'll cover below.

But first, let's answer the question of why Linkerd cares about any protocol in the first place.

Observability, reliability, and security

We generally group Linkerd's broad functionality into three categories: observability, reliability, and security. Understanding the protocol used over the connection is fundamental to each category.

Observability

At the core of Linkerd's observability capabilities is traffic instrumentation. This instrumentation requires knowledge of the protocol being used, because knowledge of the protocol can provide rich metrics. For example, knowing that a connection is using HTTP allows Linkerd to parse requests, responses, and response codes, and report metrics like response latency, request volume, and error rates. These metrics are so valuable that they're part of what Google's SRE book calls "golden signals." On the other hand, if Linkerd only knows that the connection is TCP, it's limited to logging very basic information, such as the number of bytes read and written - without the ability to interpret the bytes further.

At the heart of Linkerd's observability features is the measurement of traffic. This instrumentation requires understanding the protocol being used, as knowledge of the protocol provides rich metrics. For example, knowing that a connection is using HTTP allows Linkerd to parse requests, responses, and response codes, and report metrics like response latency, request volume, and error rates. These metrics are so valuable that they're part of what Google's SRE book calls the "golden signals." On the other hand, if Linkerd only knew that a connection was TCP, it could only log very basic information, like the number of bytes read and written—without the ability to further interpret the bytes.

Safety

Mutual TLS (mTLS) is a core feature of Linkerd. Starting in Linkerd 2.9, all TCP traffic between meshed endpoints is proxied by Linkerd over mTLS by default. (There are some caveats - see the section about skip-ports below.)

Here again, it’s crucial to understand the protocol of the connection. For example, if the connection is already TLS (e.g., through an application), there’s no reason to re-TLS. (Strictly speaking, TLS is a transport layer protocol, not an application layer protocol like HTTP, but for the purposes of this article, the distinction isn’t important.)

reliability

Finally, knowing the protocol of the underlying connection allows Linkerd to provide complex reliability features. An example here is load balancing. Without knowing the connection protocol, Linkerd is limited to balancing connections: once a TCP connection is established with a server, it cannot further operate on that connection.

However, if Linkerd knows that the connection is HTTP, it can move from connection balancing to request balancing. Linkerd will establish a pool of connections across endpoints, and balance requests across this pool. Since it now has access to both requests and responses, Linkerd can be very sophisticated in how it balances requests; in fact, it balances requests based on the recent performance of each possible endpoint (using a metric called the "exponentially weighted moving average," or EWMA), to avoid incurring tail latency from slow endpoints.

(Linkerd is also a simple solution for load balancing gRPC connections in Kubernetes.)

When protocol detection fails

While protocol detection is designed to allow Linkerd to "just work", there are some cases where it can't: the infamous server-speaks-first protocols. These protocols (including MySQL and SMTP) work by having the client establish a connection and then wait for the server to respond. From a TCP perspective, this is a perfectly legal behavior, but it means that Linkerd can't detect the protocol because the relevant information comes from the server, not the client.

(Why not simply use the server's bytes to detect the protocol? Because at the time of protocol detection, Linkerd hasn't even established a connection to the server yet. Choosing which server to talk to is a function of the load balancer, and which load balancer to use is a function of the protocol. It's a delicious, TCP-flavored chicken-and-egg problem.)

To avoid this, Linkerd introduced the skip-inbound-ports and skip-outbound-ports configuration options. These options instruct Linkerd to completely bypass the proxy for certain ports by modifying the iptables rules that Linkerd uses to connect pods through its sidecar proxy. For example, adding the annotation config.linkerd.io/skip-outbound-ports: 3306 to a workload's PodSpec instructs Linkerd to create an iptables rule to ensure that the Linkerd proxy never handles any traffic to port 3306 (the MySQL port). Similarly, the annotation config.linkerd.io/skip-inbound-ports: 3306 will write an iptables rule so that the proxy never handles MySQL traffic sent to it.

Skip Ports Configuration

These options provide a workaround for protocol detection's inability to handle server-speaks-first protocols. However, they have a significant drawback: because they bypass the Linkerd proxy entirely, Linkerd cannot apply mTLS or capture any metrics for these ports.

Opaque ports and improved protocol detection in Linkerd 2.10

To address the shortcomings of skip-ports, in version 2.10, Linkerd will add the concept of opaque ports (and a corresponding opaque-ports annotation). Opaque ports are ports that Linkerd will proxy without performing protocol detection. While this approach still requires configuration, marking a port as opaque allows Linkerd to apply mTLS and report TCP-level metrics - a big improvement over skipping it entirely.

Opaque Ports Configuration

Linkerd 2.10 will also improve how protocol detection works by making it “fail open”: if the protocol detection code doesn’t see client bytes after 10 seconds, it will treat the connection as a TCP connection and continue, rather than failing as in 2.9. This means that the worst-case behavior for annotating server-speaks-first ports without opaque-ports (or skip-ports) is a 10 second connection time delay, rather than a connection failure.

Summarize

Protocol detection is one of Linkerd's most powerful features and is fundamental to Linkerd's "just works" principle. While protocol detection is not a panacea, the introduction of opaque-ports in Linkerd 2.10 should address most of the shortcomings of the earlier skip-ports feature and allow Linkerd users to scale mTLS across their entire Kubernetes environment, regardless of protocol.

Refs

  • Protocol Detection and Opaque Ports in Linkerd

https://linkerd.io/2021/02/23/protocol-detection-and-opaque-ports-in-linkerd

<<:  Gartner: 5G network infrastructure revenue to grow by more than $5 billion in 2021

>>:  This article teaches you how to use C code to parse a network data packet?

Recommend

Quantum computing will impact businesses despite misunderstandings, study shows

Nanotechnology, transportation, cybersecurity and...

How practical is 5G for ordinary people?

5G has three main advantages over 4G: high speed,...

From UML to SysML: The language journey of describing complex systems

In the vast world of systems engineering, which l...

How can you avoid anxiety when doing SaaS?

1. The harder you work, the more anxious you beco...

When will the chaos of number portability end?

The full implementation of the number portability...