A conscientious work explaining "service call"!

A conscientious work explaining "service call"!

This article briefly summarizes the history of technological evolution related to solving "service calls" as I know it. It is purely a popular science article.

[[270557]]

Image via Pexels

This article focuses on the why and what of each step in the evolution process, and tries not to go too deep into the technical details (How).

Three elements of service

Generally speaking, a network service includes the following three elements:

  • Address: The caller accesses the network interface based on the address. The address includes the following elements: IP address, service port, service protocol (TCP, UDP, etc.).
  • Protocol format: refers to the fields of the protocol, which is determined after negotiation between the interface provider and the protocol caller.
  • Protocol name: also called protocol type, because on the same service listening port, multiple interfaces may be provided to serve the caller at the same time. In this case, the protocol type (name) is needed to distinguish different network interfaces.

Need to be stated in the service address:

  • The IP address provides the credentials to find the machine on the Internet.
  • The protocol and service port provide credentials for finding the process providing the service on this machine.

These are all knowledge points of the TCP/IP protocol stack and will not be discussed in detail here. Here we also need to explain some terms related to the service:

  • Service instance: The abbreviation of the IP address and port corresponding to the service. When you need to access a service, you need to first find the address and port of each running instance of the service before you can establish a connection for access.
  • Service registration: A service instance declares what services it provides, that is, what service interfaces a certain IP address + port provides.
  • Service discovery: The caller finds the service provider in some way, that is, knows the IP address and port where the service is running.

Calling based on IP address

The original network service is exposed to the caller through the original IP address. This approach has the following problems:

  • IP addresses are difficult to remember and meaningless.
  • In addition, from the three elements of the service above, we can see that the IP address is actually a very low-level concept, which directly corresponds to a network interface on a machine. If the IP address is used directly for addressing, it becomes very troublesome to change the machine.

"Try not to use overly low-level concepts to provide services" is an important principle in this evolutionary process. For example, today it is rare to see code written directly in assembly language.

Instead, there are more and more abstractions. This article shows the evolution of the service call field in this process.

At present, unless it is in the testing phase, it is no longer possible to provide services directly in the form of IP addresses.

Domain Name System

The IP address above is a digital identifier used by the host as a router address and is not easy to remember.

At this time, the domain name system was created. Compared with simply providing IP addresses, the domain name system uses meaningful domain names to identify services, so it is easier to remember. In addition, the IP address corresponding to the domain name can also be changed, which makes it convenient to change machines.

After having a domain name, when the caller needs to access a network service, it first goes to the domain name address service, resolves the domain name into the corresponding IP address according to the DNS protocol, and then accesses the service based on the returned IP address.

From here we can see that since there is an extra step to query the domain name address service for the mapped IP address, there is an extra step of resolution. In order to reduce the impact of this step, the caller will cache the result after resolution and it will not expire within a period of time, thus saving the cost of this query step.

Receiving and parsing the protocol

The above problem of difficult-to-remember service IP addresses has been solved through the domain name system. Now let’s look at the evolution of protocol format parsing.

Generally speaking, a network protocol consists of two parts:

  • Protocol header: This is where the protocol meta-information is stored, which may include the protocol type, message body length, protocol format, etc.

It should be noted that the packet header is generally of fixed size, or has a clear boundary (such as the \r\n terminator in the HTTP protocol), otherwise it is impossible to know when the packet header ends.

  • Agreement package: specific agreement content.

Whether it is the HTTP protocol or a custom binary network protocol, it generally consists of these two parts.

Since it is often impossible to receive all the client's protocol data at once, a state machine is generally used to receive the protocol data:

After receiving the network data, the protocol parsing has been stagnant for a long time. A protocol has multiple fields, and these different fields have different types. Simple raw types (such as integers and strings) are easy to understand, but complex types such as dictionaries and arrays are more troublesome.

The common methods at that time were as follows:

  • Use data formats such as json or xml. The advantage is that it has strong visibility and is convenient for expressing the above complex types. The disadvantage is that it is easy to be cracked and the data transmitted is large.
  • Custom binary protocol. As each company grows bigger, it is inevitable that there will be several similar wheels in this area. The most typical one I have seen is the so-called TLV format (Type-Length-Value).

The biggest problem with custom binary formats occurs during protocol debugging and negotiation. Due to the poor visibility, it is possible that a field is missing on one side and an extra field is added on the other side, which causes trouble to the debugging process.

The above problems were not greatly improved until the emergence of Google's Protocol Buffer (PB).

After the emergence of PB, many similar technologies have emerged, such as Thrift, MsgPack, etc., which will not be elaborated here. This type of technology will be described as PB.

Compared with the previous two methods, PB has the following advantages:

  • The protocol format is defined using the proto format file. The proto file is a typical DSL (domain-specific language) file that describes the specific format of the protocol, the type of each field, which fields are optional and which fields are required.

With the proto file, both ends of C\S communicate about the protocol through this file instead of specific technical details.

  • PB can generate serialization and deserialization codes corresponding to various languages ​​through proto files, which facilitates cross-language calls.
  • PB itself can compress data of certain types to reduce the data size.

Service Gateway

With the previous evolution, it is not difficult to write a simple stand-alone server. However, as the number of visits increases, one machine is no longer enough to support all requests. At this time, it is necessary to expand horizontally and add more business servers.

However, the previous architecture of accessing services through domain names encountered a problem: if multiple service instances can provide the same service, it is necessary to bind the domain name to multiple addresses in the DNS domain name resolution.

Such a solution has the following problems:

  • How to check the health of these instances and add or delete service instance addresses when problems are found? This is the so-called service high availability problem.
  • Will exposing all these service instance addresses to the external network involve security issues? Even if the security problem can be solved, security policies will need to be implemented for each machine.
  • Due to the characteristics of the DNS protocol, adding and deleting service instances is not real-time, which sometimes affects the business.

In order to solve these problems, the reverse proxy gateway component is introduced. It provides the following functions:

  • Load balancing function: Dispatches requests to service instances based on certain algorithms.
  • Provide management functions: Operation and maintenance administrators can add or remove service instances.
  • Since it determines the direction of service request traffic, it can also perform more other functions: grayscale diversion, security and attack prevention (such as access blacklist and whitelist, uninstall SSL certificate), etc.

There are four-layer and seven-layer load balancing software. Here we introduce LVS for four-layer load balancing and Nginx for seven-layer load balancing.

The above figure is a simplified TCP/IP protocol stack hierarchy diagram, in which LVS works at layer 4. That is, when a request comes to LVS, the layer 4 protocol is used to determine which service instance the request will eventually go to.

Nginx works at layer 7 and is mainly used for the HTTP protocol, that is, it determines the direction of the request based on the HTTP protocol itself.

It should be noted that Nginx can also work at layer 4, but this is not used in many places. You can refer to the Stream module of Nginx.

LVS as a Layer 4 load balancing

Since LVS has several working modes, I am not very clear about every one of them. The following description is only for Full NAT mode, and the following description may be wrong.

LVS has the following components:

  • Direct Server (DS for short): A server whose front end is exposed to the client for load balancing.
  • Virtual IP address (hereinafter referred to as VIP): The IP address exposed by DS, used as the address requested by the client.
  • Direct IP address (hereinafter referred to as DIP): IP address used by DS to interact with Real Server.
  • Real Server (hereinafter referred to as RS): The server that actually performs the work at the back end and can be expanded horizontally.
  • Real IP address (hereinafter referred to as RIP): the address of RS.
  • Client IP address (hereinafter referred to as CIP): the address of the Client.

When the client makes a request, the process is as follows:

  • Use the VIP address to access the DS. The address tuple is <src:CIP,DST:VIP>.
  • DS selects a RS to forward the request based on its load balancing algorithm. When forwarding the request, it modifies the source IP address of the request to the DIP address, so that RS thinks that DS is accessing it. The address tuple at this time is
  • RS processes and responds to the request. The source address of the response is the RIP address of RS and the destination address is the DIP address. The address tuple is .
  • After receiving the response packet, DS sends a message to the client. At this time, the source address of the response message is modified to the VIP address and the destination address is the CIP address. The address tuple is

Nginx as a Layer 7 Load Balancer

Before we start the discussion, we need to briefly talk about forward proxy and reverse proxy.

The so-called forward proxy, as I understand it, is a proxy at the client side. For example, the proxy that can be configured in the browser to access certain websites is a forward proxy, but generally speaking, it is not called a forward proxy but a proxy, that is, the default proxy is forward.

A reverse proxy is a proxy that stands in front of the server. For example, the DS server in the LVS mentioned above is a reverse proxy.

Why do we need a reverse proxy? The general reasons are as follows:

  • Load balancing: We hope that in this reverse proxy server, the requests can be evenly distributed to the servers behind.
  • Security: If you do not want to expose too many server addresses to the client, you can connect them all to the reverse proxy server to perform current limiting, security control, etc.
  • Since the client requests are accessed in a unified manner, more control strategies can be implemented at the access layer of the reverse proxy, such as grayscale traffic release, weight control, etc.

I don't think there is much difference between reverse proxy and so-called Gateway, they are just called differently, and they do similar things.

Nginx is probably the most commonly used HTTP layer-7 load balancing software. In Nginx, you can define a domain name in the configured Server block, and then bind the request of the domain name to the corresponding Upstream to achieve the effect of forwarding requests to these Upstreams.

like:

  1. upstream hello {
  2. server A:11001;
  3. server B:11001;
  4. }
  5. location / {
  6. root html;
  7. index    index .html index .htm;
  8. proxy_pass http://hello;
  9. }

This is the simplest Nginx reverse proxy configuration. In practice, there may be multiple domain names behind an access layer online. If the configuration changes a lot, each modification of the domain name and the corresponding Upstream configuration requires manual intervention, which will be very inefficient.

At this point we have to mention a term called DevOps. My understanding is that it refers to engineers who develop various tools that facilitate automated operation and maintenance.

With the above analysis, the service architecture that provides a seven-layer HTTP access interface is generally as follows:

Service Discovery and RPC

Most of the problems of stand-alone servers providing external services have been solved above. Let's briefly review:

  • The Domain Name System solves the problem of having to remember complex numeric IP addresses.
  • The emergence of PB-type software libraries solves the pain points of protocol definition parsing.
  • Gateway components solve a series of problems such as client access and server horizontal expansion.

However, a service usually does not necessarily only provide services by itself. The service process may also involve the process of querying other services, such as data services such as MySQL, Redis, etc.

This type of service that is called and queried within the service is called an internal service and is usually not directly exposed to the external network.

Services facing the public network are generally provided to external callers in the form of domain names. However, for internal calls between services, the domain name form is not enough. The reasons are:

  • The granularity of DNS service discovery is too coarse, only reaching the IP address level, and the service port still needs to be maintained by the user.
  • To check the health of the service, DNS checking is not enough and the involvement of operations and maintenance is required.
  • DNS is very lacking in collecting service status, which should ultimately affect how the service is called.
  • DNS changes require manual intervention and are not intelligent or automated enough.

In summary, service calls between intranets usually implement a "service discovery" system, which includes the following components:

  • Service discovery system: used to provide service addressing and registration capabilities, as well as statistical summary of service status and change service call status according to service conditions.

For example, if a service instance responds slowly, the traffic allocated to that instance will be less responsive.

Since this system can provide service addressing capabilities, some addressing strategies can be implemented here. For example, certain grayscale traffic can only go to certain specific instances, and the traffic weight of each instance can be configured.

  • A set of RPC libraries used with this service system.

The RPC library provides the following functionality:

  • Service provider: Use the RPC library to register your own service to the service discovery system and report your own service status.
  • Service caller: Use the RPC library to address the service and obtain the latest service scheduling strategy from the service discovery system in real time.
  • It provides protocol serialization and deserialization functions, load balancing calling strategies, circuit breakers, current limiting and other security access strategies. This part is applicable to both service providers and callers.

With this service discovery system and the RPC library used in conjunction with it, let's take a look at what the service call looks like now:

  • Those who write business logic no longer need to pay attention to service addresses, protocol analysis, service scheduling, reporting of their own service status, and other tasks that are not closely related to the business logic itself. They can just focus on the business logic.
  • The service discovery system generally has a management backend interface that allows you to modify and view the service's policies.
  • There will also be a corresponding service monitoring system, which is a system that collects service data in real time for calculation. With this system, the service quality can be clearly seen at a glance.
  • The service health status check is fully automated, and the service will be downgraded when the condition is not good, with less manual intervention and more intelligent and automated.

Now the service architecture has evolved into this:

Service Mesh

The architecture has developed to the above level and can actually solve most of the problems. In the past two years, a very popular concept has emerged: Service Mesh, which is translated into Chinese as "service grid". Let's see what problems it can solve.

In the previous service discovery system, a matching RPC library is required, but this will have the following problems:

  • What should we do if we need to support multiple languages? Should we implement a corresponding RPC library for each language?
  • Upgrading the library is very troublesome. For example, if the RPC library itself has a security vulnerability and needs to be upgraded, it is generally difficult to push the business side to do this upgrade, especially after the system becomes larger.

As you can see, since the RPC library is a component embedded in the process, the above problem is very troublesome, so a solution was devised: split the original process into two processes.

As shown in the following figure:

Before service meshing, the service caller instance communicated with the service provider instance through its own internal RPC library.

After the service is Mesh-based, a Local Proxy, which is the Service Mesh Proxy, will be deployed on the same machine as the service caller.

At this time, the traffic of the service call will first go to this Proxy, and then it will complete the original RPC library response work.

As for how to hijack this traffic, the answer is to use Iptables to forward the traffic of a specific port to the Proxy.

With this layer of splitting, the business service is separated from the Proxy responsible for the RPC library, and the above two pain points become the upgrade and maintenance issues of the Mesh Proxy on each physical machine.

Multiple languages ​​are not a problem either, because all RPC communications are done through network calls, rather than using RPC libraries within the process.

However, this solution is not without problems. The biggest problem is that the addition of this layer of call will inevitably affect the original response time.

As of now (June 2019), Service Mesh is still a product that is more concept than reality.

From the above evolutionary history, we can see that the so-called "middle layer theory", that is, "Any problem in computer science can be solved by another layer of indirection" is widely used in this process.

For example, the domain name system was introduced to solve the problem of difficult-to-remember IP addresses; the gateway was introduced to solve the load balancing problem, and so on.

However, the introduction of each middle layer will inevitably bring other impacts. For example, Service Mesh will make one more call to Proxy. How to strike a balance is another issue.

In addition, going back to the original three elements of service, we can see that the entire evolutionary history has gradually shielded the processes of lower-level components, such as:

  • The emergence of domain names masks IP addresses.
  • The service discovery system blocks protocols and port numbers.
  • The PB class serialization library shields the user from parsing the protocol.

It can be seen that the evolutionary process allows business developers to focus more on business logic. This type of evolutionary process not only happens today, but will also not only happen today. Similar evolutions will happen again in the future.

Author: codedump

Introduction: I have been engaged in Internet server backend development for many years. You can visit the author's blog: https://www.codedump.info/ to read more articles.

<<:  Inventory | 7 major acquisitions in the cybersecurity field recently

>>:  Why is the latency so high for a simple HTTP call? Let’s capture a packet and analyze it

Recommend

Deploy on demand: China Telecom plans to open 320,000 5G base stations in 2021

[[386510]] Today, China Telecom announced its ful...

A brief comparison of two SR-TE implementation methods

1. Brief description of background technology Reg...

5G accelerates the process of 2G/3G network withdrawal in my country

Recently, British mobile operator Three UK has de...

How to lay the foundation for closed-loop automation

Today, many enterprises are digitally transformin...

5G mobile phones are coming! Who will be the next Nokia?

"With 43 million analog mobile phone users, ...

ABC in the eyes of communication professionals...

[[375451]] As a communications engineer, I am exp...

Computer Network Architecture

[[416546]] The formation of computer network arch...

Ruijie Networks escorts Guangzhou's "Digital Asian Games"

With the Chinese women's volleyball team'...

What does the battle for AI spectrum mean for 5G?

With the rapid development of smart cities, every...