This article is reprinted from the WeChat public account "Kirito's Technology Sharing", author kiritomoe. Please contact Kirito's Technology Sharing public account for reprinting this article. PrefaceI just found out that WeChat public accounts have a tagging function, so I tagged all my Dubbo-related articles. After careful counting, this is my 41st original Dubbo article. If you want to see my other Dubbo articles, you can click on the topic tags. This is an article I wanted to write a long time ago. Recently, I saw a friend in the group share an article about Dubbo connection, which reminded me of this topic. Today I want to talk to you about the topic of connection control in Dubbo. Speaking of "connection control", some readers may not have reacted yet, but you may not be unfamiliar with the following configuration:
If you still don't understand the usage of connection control in Dubbo, you can refer to the official document: https://dubbo.apache.org/zh/docs/advanced/config-connections/. By the way, the official documentation of Dubbo has undergone a major change recently, and many familiar documents are almost nowhere to be found. Orz. As we all know, the default communication of dubbo protocol is long connection, and the connection configuration function is used to determine the number of long connections established between consumers and providers. However, the official document only gives the usage of this function, but does not explain when to configure connection control. This article will mainly discuss this topic. This article will also cover some knowledge points related to long connections. ##Usage Let's first look at a simple demo built with Dubbo. We start a consumer (192.168.4.226) and a provider (192.168.4.224) and configure their direct connection. consumer:
Provider:
Persistent connections are invisible and intangible, so we need an observable task to "see" them. After starting the provider and consumer, you can use the following command to view the TCP connection status:
Provider:
consumer:
Through the above observations we can discover several facts. The above TCP connection already exists just by starting the provider and consumer. Please note that I did not trigger the call. In other words, the default strategy of Dubbo to establish a connection is at address discovery, not at call time. Of course, you can also modify this behavior by lazy loading lazy="true", so that the establishment of the connection can be delayed until the call time.
In addition, you can find that there is only one long connection between the consumer and the provider. 20880 is the default open port of the Dubbo provider, which is the same as the default open port 8080 of tomcat, and 59110 is a port randomly generated by the consumer. (I have communicated with some friends before and found that many people don’t know that consumers also need to occupy a port) Today's protagonist "Connection Control" can control the number of long connections. For example, we can configure it as follows
Start the consumer again and observe the long connection situation Provider:
consumer:
As you can see, there are now two long connections. When do you need to configure multiple long connections?Now we know how to control the connection, but when should we configure how many long connections? At this time, I can tell you that it depends on the production situation, but if you often read my official account, you will definitely know that this is not my style. What is my style? Benchmark! Before writing this article, I had a brief discussion with several colleagues and netizens on this topic. In fact, there is no conclusion, except for the different throughputs of single connection and multiple connections. Refer to the previous Dubbo github issues, such as: https://github.com/apache/dubbo/pull/2457. I also participated in the discussion of this PR. To be honest, I was skeptical. My view at the time was that multiple connections might not necessarily improve the throughput of the service (still quite conservative, not so absolute). Next, let’s use benchmarks to talk about this. The test project is still our old friend. We will use the dubbo-benchmark project officially provided by Dubbo.
The test project has been introduced in the previous article, so I will not go into details here. The test plan is also very simple. Two rounds of benchmarks are conducted to observe the throughput of the test method when connections=1 and connections=2 respectively. Just do it, skip a bunch of test steps and give the test results directly.
connections=2
From the test results, it seems that the difference between single connection and multiple connections is very large, almost twice as much! It seems that the effect of connection control is really good, but is it really the case? After the first test with this solution, I didn't quite believe the result, because I had previously tested multiple connections in other ways, and I had also participated in the 3rd Middleware Challenge, which gave me the understanding that most of the time, a single connection can often give the best performance. Even for hardware reasons, the gap should not be twice. With this question in mind, I began to study whether there was something wrong with my test scenario? Identify problems with the test solutionAfter discussing with Flash, his words finally helped me identify the problem. I wonder if you can immediately identify the problem after reading my conversation with the Flash. The biggest problem with the previous test plan was that the variables were not controlled well. As everyone knows, when the number of connections changes, the number of IO threads actually used also changes. Dubbo uses Netty to implement persistent connection communication. When it comes to the relationship between persistent connection and IO thread, we need to introduce Netty's connection model. In a nutshell, Netty's IO worker thread and channel settings are a one-to-many binding relationship, that is, after a channel is connected, all IO operations will be completely handled by one IO thread. Let's take a look at how Dubbo sets up the worker thread group of NettyClient and NettyServer: Client org.apache.dubbo.remoting.transport.netty4.NettyClient:
Constants.DEFAULT_IO_THREADS is hard-coded in org.apache.dubbo.remoting.Constants
On my 4c8g machine, the default is 5. Server org.apache.dubbo.remoting.transport.netty4.NettyServer:
The server side can be configured, for example, we can control the number of IO threads on the server side through the protocol:
If not set, the logic is consistent with the client, which is core + 1 thread. Well, here is the problem. Since I did not set any IO threads, the client and server will open 5 IO threads by default. When connections=1, Netty will bind channel1 to an IO thread, and when connections=2, Netty will bind channel1 and channel2 to NettyWorkerThread-1 and NettyWorkerThread-2 in sequence, so there will be two IO threads working, and such test results are of course unfair. The actual situation needs to be considered here. In actual production, most of the time it is a distributed scenario, and the number of connections must be greater than the number of IO threads. Therefore, it is basically impossible for the number of channels in the test scenario to be less than the number of IO threads. The solution is also very simple. We need to control the variables to make the number of IO threads consistent and only observe the impact of the number of connections on throughput. For the server, you can configure iothreads=1 at the protocol layer; for the client, since the source code is hard-coded, I can only modify the source code and re-package it locally so that the number of client IO threads can also be specified through the -D parameter. After the transformation, we obtained the following test results: 1 IO thread 1 connection
1 IO thread 2 connections
It can be found that simply increasing the number of connections will not increase the throughput of the service. Such test results are more in line with my expectations. SummarizeJudging from the results of the above tests, some configuration parameters are not necessarily better when they are larger. I have also analyzed similar examples in scenarios such as multi-threaded file writing. Only theoretical analysis + actual testing can lead to convincing conclusions. Of course, personal tests may also lead to errors due to the omission of local key information. For example, if I did not eventually find the implicit correlation between the number of IO threads and the number of connections, it would be easy to draw the wrong conclusion that the number of connections is proportional to the throughput. Of course, it does not necessarily mean that the final conclusion of this article is reliable. It may still be imperfect. You are also welcome to leave a message and make comments and suggestions. Finally, back to the original question, when should we configure Dubbo's connection control? According to my personal experience, most of the time, the number of connections in the production environment is very large. You can pick an online host and roughly count it through netstat -ano | grep 20880 | wc -l. Generally, it is far more than the number of IO threads. There is no need to configure multiple connections. The number of connections and throughput do not have a linear growth relationship. The Dubbo framework having this capability and everyone really needing to use this capability are two completely different things. I believe that most readers should have passed the stage where the project is driven by the novelty of technology, right? If one day you need to control the number of connections to achieve a certain special purpose, you will sincerely sigh that Dubbo is really powerful and has all these extension points. Is Dubbo's connection control really completely useless? Not entirely. My test scenarios are still very limited, and different hardware may produce different results. For example, in the third middleware performance challenge, I achieved the best result with two connections, not a single connection. Finally, if you only use Dubbo to maintain your microservice architecture, in most cases you don’t need to pay attention to the connection control feature. Just spend more time moving bricks. That’s it, I’m moving bricks too. |
June 27, 2017 - Cisco today announced that its gl...
Maxthon Hosting is a provider of high-quality ove...
TCP is one of the main protocols of the Internet ...
After the General Office of the CPC Central Commi...
[[354426]] 【51CTO.com original article】 HTTP Begi...
[[398458]] In the intranet environment, there was...
OneTechCloud launched a new promotion in March, o...
WebHorizon is a foreign VPS hosting company estab...
[51CTO.com original article] If we were to select...
In the era of "everything can be API", ...
Optical networking is a technology that uses ligh...
In today’s article, let’s talk about computing po...
How many 5G frequency bands a mobile phone can su...
It has been one year since the Data Security Law ...
An example of using NoC to optimize encryption an...