Explain RPC and HTTP in plain language

Explain RPC and HTTP in plain language

With the continuous development of enterprise IT services, when a single server is gradually unable to bear the increasing pressure of user requests, multiple servers need to be combined to form a "service cluster" to provide services to the outside world. At the same time, business services will become more and more bloated as product demand increases, and service splitting must be performed in the architecture. A complete large service will be broken up into many independent small services, and each small service will be managed by an independent process to provide services to the outside world. This is "microservices."

When a user's request comes in, we need to distribute the user's request to multiple services for separate processing, and then summarize the results of these sub-services and present them to the user. So how services should interact with each other is the core problem that needs to be solved. RPC was invented and exists to solve the information interaction between services.

What is RPC?

RPC (Remote Procedure Call) is a common communication method for distributed systems and has a history of more than 40 years. When two physically separated subsystems need to establish a logical connection, RPC is one of the common technical means of matchmaking. In addition to RPC, common multi-system data interaction solutions include distributed message queues, HTTP request calls, databases, and distributed caches.

RPC and HTTP calls do not go through middleware, they are direct data exchanges between end-to-end systems. HTTP calls can actually be seen as a special type of RPC, except that RPC in the traditional sense refers to long-connection data exchanges, while HTTP generally refers to short, ready-to-go links.

RPC is present in all kinds of middlewares we are familiar with. Heavyweight open source products such as Nginx/Redis/MySQL/Dubbo/Hadoop/Spark/Tensorflow are all built on the basis of RPC technology. The RPC we are talking about here refers to the broad sense of RPC, that is, the communication technology of distributed systems. The status of RPC in technology is like the air around us, it is everywhere, but many people don't even know it exists.

Nginx and RPC

Ngnix is ​​the most widely used proxy server for Internet companies. It can provide load balancing for backend distributed services. It can aggregate multiple backend service addresses into a single address to provide services to the outside world. As shown in the figure, Django is the most popular web framework in the Python technology stack.

The interaction between Nginx and the backend service can also be understood as RPC data interaction in essence. You may argue that the HTTP protocol is used between Nginx and the backend service, which is a short connection and cannot be strictly considered an RPC call.

You are right, but other protocols can be used between Nginx and the backend service, such as uwsgi protocol, fastcgi protocol, etc. Both protocols use binary protocols that save more traffic than HTTP protocol. As shown in the figure above, uWSGI is a well-known Python container, which can be used to start a uwsgi protocol server to provide external services.

The uwsgi communication protocol is widely used in the Python language system. If an enterprise uses the Python language stack to build Web services, when they deploy Python applications in the production environment, they either use the HTTP protocol or the uwsgi protocol to establish communication with Nginx.

The Fastcgi protocol is very common in the PHP language system. Nginx and PHP-fpm processes generally use the Fastcgi protocol for communication.

Hadoop and RPC

In the field of big data technology, RPC also occupies a very important position. A lot of distributed technologies are widely used in the field of big data. Distribution means physical isolation of nodes, isolation means the need for communication, and communication means the existence of RPC. The amount of communication required by big data is larger than that of business systems, so more in-depth work is done on data communication optimization.

For example, the most common Hadoop file system hdfs generally includes a NameNode and multiple DataNodes. The NameNode and DataNode communicate through a binary protocol called Hadoop RPC.

TensorFlow and RPC

In the field of artificial intelligence, RPC is also very important. If the famous TensorFlow framework needs to process hundreds of millions of data, it needs to rely on distributed computing power and clustering. When multiple distributed nodes need collective wisdom, RPC technology must be introduced for communication. The RPC communication framework of Tensorflow Cluster uses the gRPC framework developed internally by Google.

HTTP calls are actually a special type of RPC

When using HTTP1.0 protocol, HTTP calls can only be short-link calls, and the connection will be closed after a request is made. HTTP1.1 has improved on the HTTP1.0 protocol and introduced the KeepAlive feature to keep the HTTP connection open for a long time, so that multiple consecutive requests can be made on the same connection, further narrowing the gap between HTTP and RPC.

When the HTTP protocol evolved to 2.0, Google open-sourced a communication framework based on the HTTP2.0 protocol and named it gRPC, which is Google RPC. At this time, there is no clear boundary between HTTP and RPC. Therefore, in the following text, we will no longer emphasize the subtle differences between RPC and HTTP request calls, and directly call them RPC.

HTTP VS RPC (Mandarin VS Dialect)

The relationship between HTTP and RPC is like the relationship between Mandarin and dialects. When making cross-enterprise service calls, HTTP APIs are often used, which is Mandarin. Although it is not efficient, it is universal and does not require much learning cost for communication. However, RPC is more efficient within the enterprise. The same enterprise uses a set of dialects for efficient communication, which saves more resources than the universal HTTP protocol. There are many dialects throughout China, just as many internal services in enterprises have their own set of interaction protocols. Although the country has been advocating the use of Mandarin for communication, after so many years, if you go back to your hometown to visit relatives, you will find that people around you still speak dialects.

If we go deeper, Mandarin is essentially a dialect, but it is the official dialect and the most widely used dialect. In comparison, other dialects are small languages, and among small languages, there are also a few dialects that are widely used and have a larger proportion. This is just like Protobuf and Thrift in the open source RPC protocol. They should be the two most widely used RPC protocols.

RPC and distributed system interaction solution

If the two subsystems are not separated on the network, but are two processes running on the same operating system instance, the communication methods between them can be more diverse. In addition to the several distributed solutions mentioned above, there are also shared memory, semaphores, file systems, kernel message queues, pipes, etc., which are essentially through the operating system kernel mechanism to interact with data and messages without going through the network protocol stack.

However, in modern enterprise services, such stand-alone applications are very rare, because stand-alone applications mean a single point of failure - "one person falls and the whole family falls". Business subsystems often need to be isolated through the physical network stack, so distributed solutions have a lot to do in enterprise environments that require high availability and uninterrupted services, which also brings RPC into its own era of glory.

The distributed subsystem interaction solution mentioned above includes database, message queue and cache in addition to RPC technology. But in fact, these three are essentially an application combination of RPC technology. We can understand the database service as the following picture:

It can be seen that the interaction between the subsystem and the database is also carried out through RPC, but here it is a complex combination of message interactions between the three subsystems. If you go deeper, you will find that the database here is not a stand-alone database, but a database with master-slave replication function, such as MySQL. This kind of master-slave read-write separation database is generally used in Internet companies. A business subsystem writes data to the master database, the master database then synchronizes the data to the slave database, and then another business subsystem retrieves the data from the database. At this point, they can be further regarded as a more complex RPC data interaction between the four subsystems.

<<:  What are the remaining obstacles to China's 5G?

>>:  Dongfangtong, China Power Construction, and China Power Fufu signed a strategic cooperation agreement on emergency industry to create "smart +" emergency services

Blog    

Recommend

Ten Limitations of MU-MIMO in WiFi

MIMO technology has continued to evolve since its...

Haha! TCP leaks operating system information...

[[414423]] Hello everyone, I am Xuanyuan. A few d...

How to increase the speed of the router

[[183829]] How to increase the speed of the route...

In the 5G era, what else can we sell besides traffic?

According to official news, 5G will be put into c...

Can the United States' 6G layout surpass 5G and surpass my country?

At the 2019 Mobile World Congress, Huawei brought...

2018 Trends: What will the future hold for AI and IoT?

What kind of chemical reaction will occur between...

Faconhost: £17.5/year-1GB/15GB NVMe/500GB@300Mbps/Los Angeles 9929 line

Faconhost is a relatively new foreign hosting com...