Dubbo3.0 Alibaba Large-Scale Practice Analysis—URL Reconstruction

Dubbo3.0 Alibaba Large-Scale Practice Analysis—URL Reconstruction

1. Introduction to URL

Before we discuss the specific optimization of address push performance, we need to first understand something closely related to it - URL.

1. Definition

Without mentioning Dubbo, most of us are familiar with the concept of URL. Uniform Resource Locators (RFC1738 -- Uniform Resource Locators (URL)) should be the most well-known RFC specification, and its definition is also very simple.

The resources available on the Internet can be represented by simple strings, and this document describes the syntax and semantics of such strings, which are called "Uniform Resource Locators" (URLs).

A standard URL format can contain at most the following parts:

 protocol : // username : password@host : port / path?key = value & key = value

Some typical URLs:

 http://www.facebook.com/friends?param1=value1&param2=value2       
https : // username : [email protected] : 8080 / list ? version = 1.0.0
ftp : // username : password@192 .168 .1 .7 : 21 / 1 / read .txt

Of course, there are some URLs that are not in line with the convention and are also classified as URLs:

 192.168.1.3:20880    
url protocol = null , url host = 192.168 .1 .3 , port = 20880 , url path = null

file : /// home / user1 / router .js ?type = script
url protocol = file , url host = null , url path = home / user1 / router.js

file : // home / user1 / router .js ?type = script < br >
url protocol = file , url host = home , url path = user1 / router .js

file : /// D : / 1 / router .js ?type = script
url protocol = file , url host = null , url path = D : /1/router.js

file : / D : / 1 / router .js ?type = script
Same as above file : /// D : / 1 / router.js ?type = script

/home/user1/router.js ? type = script
url protocol = null , url host = null , url path = home / user1 / router.js

home / user1 / router.js ?type = script
url protocol = null , url host = home , url path = user1 / router .js

2. URLs in Dubbo

In Dubbo, similar URLs are also used, mainly for transferring data between various extension points. The specific parameters that make up this URL object are as follows:

  1. protocol: generally various protocols in Ddubbo such as Dubbo thrift http zk.
  2. Username/password: Username/password.
  3. host/port: host/port.
  4. path: interface name.
  5. parameters: parameter key-value pairs.

Some typical Dubbo URLs

 dubbo : // 192.168 .1 .6 : 20880 / moe .cnkirito .sample .HelloService ?timeout = 3000
Describe a dubbo protocol service

zookeeper : // 127.0 .0 .1 : 2181 / org .apache .dubbo .registry .RegistryService ?application = demo - consumer & dubbo = 2.0 .2 & interface = org .apache .dubbo .registry .RegistryService & pid = 1214 & qos .port = 33333 & timestamp = 1545721981946
Describe a Zookeeper registry

consumer : // 30.5 .120 .217 / org .apache .dubbo .demo .DemoService ?application = demo - consumer & category = consumers & check = false & dubbo = 2.0 .2 & interface = org .apache .dubbo .demo .DemoService & methods = sayHello & pid = 1209 & qos .port = 33333 & side = consumer & timestamp = 1545721827784
Describing a Consumer

It can be said that an implementation in any field can be considered as a type of URL. Dubbo uses URL to uniformly describe metadata and configuration information throughout the entire framework.

2. Dubbo 2.7

1. URL Structure

In Dubbo 2.7, the structure of URL is very simple, and one class covers everything, as shown in the following figure.


2. Address push model

Next, let's take a look at the address push model solution in Dubbo 2.7. The main performance issues are caused by the following processes:


The main process in the above figure is:

(1) The user adds or deletes a specific Provider instance of DemoService (commonly seen in cases of capacity expansion or contraction, network fluctuations, etc.);

(2) ZooKeeper pushes all instances of DemoService to the Consumer side;

(3) The Consumer side regenerates the URL in full based on the data pushed by Zookeeper.

According to this solution, when the number of Provider instances is small, the impact on the Consumer side is relatively small, but when a certain interface has a large number of Provider instances, there will be a large number of unnecessary URL creation processes.

Dubbo 3.0 has made a series of optimizations mainly for the above push process, which we will explain in detail below.

3. Dubbo 3.0

1. URL Structure

Of course, the optimization of the address push model is still inseparable from the optimization of the URL. The following figure shows the new URL structure used in the process of optimizing the address push model in Dubbo 3.0.


From the above figure, we can see that several important attributes in the URL of Dubbo 2.7 no longer exist in Dubbo 3.0, and are replaced by the URLAddress and URLParam classes. The original parameters attribute has been moved to params in URLParam, and other attributes have been moved to URLAddress and its subclasses.

Next, we will introduce three new subclasses of URL, among which InstanceAddressURL belongs to the application-level interface address and will not be introduced in this chapter.

The main difference between ServiceConfigURL and ServiceAddressURL is that ServiceConfigURL is the URL generated when the program reads the configuration file, while ServiceAddressURL is the URL generated when the registration center pushes some information (such as providers).

Here we would like to mention why there is a subclass of DubboServiceAddressURL. According to the current structure, ServiceAddressURL has only this subclass, so all the properties of both can be put into ServiceAddressURL. So why do we need this subclass? In fact, Dubbo 3.0 is designed to be compatible with the HSF framework, and a ServiceAddressURL is abstracted. The HSF framework can inherit this class and use HSFServiceAddressURL. Of course, this class is not reflected at present, so we will briefly mention it here without further explanation.

So, let’s discuss why Dubbo 3.0 changed to this data structure, and how this structure is related to the optimization of the address push model!

2. Optimization of address push model

  • Optimizing URL structure

We can see in the class diagram in the previous section that although the original attributes have been moved to URLAddress and URLParam, the URL subclass still has several more attributes. These attributes are naturally added for optimization, so here we will talk about the functions of these attributes.

ServiceConfigURL: This subclass adds the attribute property, which is mainly used to make the params of URLParam redundant. It only changes the value type from String to Object, reducing the format conversion cost of obtaining parameters each time in the code.

ServiceAddressURL: This subclass and its corresponding subclasses add overrideURL and consumerURL attributes. Among them, consumerURL is the configuration information for the consumer side, and overrideURL is the value written when dynamically configuring on Dubbo Admin. When we call the getParameter() method of the URL, the priority is overrideURL > consumerURL > urlParam. In Dubbo 2.7, the dynamic configuration attributes will replace the attributes in the URL, and the consumption is not negligible when you have a large number of URLs. The overrideURL here avoids this consumption because all URLs will share the same object.

  • Multi-level cache

Caching is the focus of Dubbo 3.0's optimization on URLs. This part is also an optimization directly made for the address push model. Next, we will begin to introduce the specific implementation of multi-level caching.

First of all, multi-level caching is mainly reflected in the CacheableFailbackRegistry class, which directly inherits from FailbackRegistry. Taking Zookeeper as an example, let's look at the difference between the inheritance structure of Dubbo 2.7 and Dubbo 3.0.


You can see that in the CacheableFailbackRegistry cache, we have added three cache attributes: stringAddress, stringParam, and stringUrls. The following figure describes the specific usage scenarios of these three caches.

In this solution, we use cache data in three dimensions (URL string cache, URL address cache, and URL parameter cache). In this way, the cached data can be effectively utilized in most cases, reducing the consumption of repeated Zookeeper notifications.

  • Delayed notification

In addition to the optimizations mentioned above, there are actually two other small optimizations.

The first is that when parsing a URL, you can directly use the encoded URL string bytes for parsing. In Dubbo 2.7, all encoded URL strings need to be decoded before they can be parsed into URL objects. This method also directly reduces the overhead of the URL decoding process.

The second is that the notification mechanism after the URL change increases the delay. The following figure uses Zookeeper as an example to explain the implementation details.


In this solution, when the Consumer receives a change notification from Zookeeper, it will actively sleep for a period of time. After the sleep period ends, only the last change will be retained. The Consumer will use the last change to update the listening instance, thereby reducing the overhead of creating a large number of URLs.

  • String Reuse

In the old version implementation, strings with the same attributes in different URLs are stored in different addresses in the heap, such as protocol, path, etc. When there are a large number of providers, there will be a large number of duplicate strings in the heap on the Consumer side, resulting in low memory utilization. Therefore, another optimization method is provided here, namely string reuse.

And its implementation is also very simple, let's take a look at the corresponding code snippet.

 public class URLItemCache {
private static final Map < String , String > PATH_CACHE = new LRUCache <> ( 10000 ) ;
private static final Map < String , String > PROTOCOL_CACHE = new ConcurrentHashMap <> ( ) ;

// Omit irrelevant code snippets

public static String checkProtocol ( String _protocol ) {
if ( _protocol == null ) {
return _protocol ;
}
String cachedProtocol = PROTOCOL_CACHE .putIfAbsent ( _protocol , _protocol ) ;
if ( cachedProtocol != null ) {
return cachedProtocol ;
}
return _protocol ;
}

public static String checkPath ( String _path ) {
if ( _path == null ) {
return _path ;
}
String cachedPath = PATH_CACHE .putIfAbsent ( _path , _path ) ;
if ( cachedPath != null ) {
return cachedPath ;
}
return _path ;
}
}

As can be seen from the above code snippet, string reuse is simply using a Map to store the corresponding cache value. When you use the same string, the existing object will be obtained from the Map and returned to the caller, thereby reducing the number of duplicate strings in the heap memory to achieve an optimization effect.

3. Optimize results

Here I quoted two figures from the article "Dubbo 3.0 Outlook: Service Discovery Supports Millions of Clusters, Bringing Scalable Microservice Architecture" to illustrate the optimization results. The figure below simulates the consumption on the Consumer side caused by the continuous changes in interface data when there are 2.2 million Provider interfaces. We can see that the entire Consumer side is almost occupied by Full GC, which seriously affects the performance.

Then let's take a look at the stress test results in the same environment after optimizing the URL in Dubbo 3.0, as shown in the following figure.



We can clearly see that the frequency of Full GC has been reduced to only 3 times, which greatly improves performance. Of course, there are other comparisons in the article, which will not be quoted here. Interested readers can read the article by themselves.

About the Author:

Wu Zhiguo, active contributor to the Apache Dubbo community

<<:  Ruishu's next-generation WAF - WAAP platform, a one-stop dynamic active defense covering Web, APP, cloud and API

>>:  The computing power network came into being, and the service model will change from "resource-based" to "task-based"

Recommend

User Datagram Protocol (UDP) in plain language

What is UDP? UDP is the abbreviation of User Data...

WOT Li Jian: The evolution of the eleme container platform

[51CTO.com original article] The Global Software ...

SD-WAN vs. VPN: How Do They Differ?

When it comes to comparing SD-WAN vs. VPN service...

Understanding WiFi 6 Features for Wave 1 and Wave 2

The rollout of Wi-Fi 6 will consist of two waves ...

6 Examples of How 5G Can Improve IoT Deployments

As digital transformation is in full swing, the n...

Why do you need to master the data center structure diagram?

The computer room of a data center often encounte...