WOT Xu Dongchen: JVM-Sandbox Non-intrusive runtime AOP solution based on JVM

WOT Xu Dongchen: JVM-Sandbox Non-intrusive runtime AOP solution based on JVM

[51CTO.com original article] On May 18-19, 2018, the Global Software and Operation Technology Summit hosted by 51CTO was held in Beijing. Technical elites from global companies gathered in Beijing to discuss the forefront of software technology and jointly explore the new boundaries of operation technology. In this conference, in addition to the star-studded main forum, the 12 sub-forums were also unique. At the "Microservice Architecture Design" sub-forum on the afternoon of the 19th, Xu Dongchen, a test development engineer from Alibaba Taobao Technology Quality Department, delivered a wonderful speech. Xu Dongchen, who served as both host and speaker, opened the event with ease and energy, telling us about the background of the creation of JVM-Sandbox and its advantages, including application scenarios, core technologies, and open source.

Background of JVM-Sandbox

As the scale of software expands and system functions are segmented, to ensure the stability of Alibaba's entire system, many tool platforms and monitoring systems need to be built. What kind of work do the hardworking development and testing personnel need to do? Xu Dongchen gave an example, for example, we need to do system current limiting, flow control, fault simulation, information monitoring, link tracking, problem location, etc. What we should be most concerned about is whether the basic business will be affected after the system architecture is upgraded. Testers need to seek automated testing methods to achieve automated business regression.

For test engineers who write interface tests, they prefer to perform business regression by recording online and replaying offline, which can greatly save costs. If such a regression is to be performed, the method is to monitor the input parameters and return values ​​of the method, or to monitor whether there are any problems in the entire link. The following is monitoring, link tracking, and precise regression.

Xu Dongchen listed four more specific scenarios. To ensure its stability, a lot of things need to be done. When we make a simple abstraction of these things, the above tool platforms do two things:

The first is the monitoring and surrounding control of methods, and the second is the acquisition and statistics of line link information. Taking the monitoring and surrounding control of methods as an example, this is done using Java, using the most familiar NOP. However, there are some problems with using NOP. If we want to have a unified monitoring platform, the ratio of monitoring system code to code is also very important. The most exaggerated system she has seen is that the ratio of monitoring code to business code is 1:2, that is, 1/3 of the code is monitoring code, and this kind of monitoring code is relatively cumbersome because it needs to be diverged before it can be sent up, and this is the problem.

In terms of the line link, in order to calculate the coverage, if we want to maintain the flexibility of the system, we cannot rebuild the system just to locate a problem or add a line of log. When we develop the supporting tools for the stability platform, we need to have three characteristics:

***: Non-intrusive to development code.

Second: It must take effect in real time, because when the problem is solved, the scene must be preserved, so it must take effect in real time.

Third: Dynamically pluggable.

To achieve this, a dynamic bytecode enhancement solution is needed. As mentioned above, whether it is fault drills, strong and weak dependency detection, traffic recording and return visits, problem location or monitoring system, if we make such a tool platform every time, and implement a dynamic bytecode enhancement at the bottom layer, the investment cost is very high, and there is a learning threshold. This platform derived from the above will all act on a system. In fact, their underlying bytecode enhancements and whether this code will interfere with each other are all problems. In order to solve these problems, in order to shield the high technical threshold of bytecode enhancement, in order to reduce the cost of R&D and operations, and in order to dynamically manage multiple modules at the upper layer. We developed JVM--Sandbox.

Advantages of JVM-Sandbox

JVM-Sandbox has the convenience of AOP general API and the flexibility of tracking points, and is a real-time non-intrusive AOP container. In terms of its functions, first of all, JVM-Sandbox is based on the JVMTI technical specification and provides a container with a plug-and-play module interface for observing and changing the results of code execution. JVM-Sandbox provides a new implementation solution for AOP - replacing proxies with stubs.

User group: Students who use bytecode enhancement technology to develop tools, implement business functions, and conduct testing.

Core functions: First, it provides a bytecode-enhanced unified API. Second, it provides a non-cut-in container, which is actually isolated from your target machine. Third, it is our container management. You can mount multiple modules on the basis of JVM-Sandbox, and each module can complete its own link tracking and problem location functions, which can be mounted at the same time.

What functions can be achieved using Sandbox? In abstraction, it can sense and change the input parameters. It can sense and change the return value and throw exceptions. It can control the process, return before execution, reconstruct a new result object before execution, re-throw exceptions after exceptions or directly return a normal result. It can help you do these things. Xu Dongchen gave everyone a brief introduction:

Core Operation Objects

First, let's look at the core operation objects. This is an abstract process. We have already used some open source tools, including positioning tools and testing tools. We abstracted them to be observations before execution and abnormal observations, as well as changes before execution and abnormal changes. In fact, after this abstraction, we have three core events, such as transform events, normal flow and intervention flow of three links, and line events. Line events are actually adding an insert after each line of code.

How to isolate and communicate with the target

So, how to ensure that the Sandbox and the target machine are isolated from each other? The method is very simple, which can be summarized in one sentence: destroy the parent delegation mechanism and customize the ClassLoader to complete class isolation. Inject a Spy class into the Bootstrap ClassLoader to complete the communication. This is the most primitive parent delegation mechanism.

After destroying the parent delegation mechanism, when a class is to be loaded, it will first check whether the current ClassLoader already exists. If not, it will delegate to its parent, its parent ClassLoader, to ask if it has already been loaded. If it has not been loaded, it will ask upwards until it reaches the Boots trap ClassLoader. This is the original parent delegation mechanism.

What has the parent delegation mechanism become after the destruction? To mount a class, it will first check whether my current ClassLoader has loaded it. If not, it will let the current ClassLoader try to load it, that is, it will no longer ask its parent class, unless it cannot load it, it will ask its parent ClassLoader to ask if you have loaded it. If the parent ClassLoader has not loaded it, it will let the parent ClassLoader try to load it. In this way, the isolation between my target applications and the Sandbox is completed.

In fact, when Sandbox is started, it will do some events. It will create a new ClassLoader for each Module, that is, the upper-level mounted Module and Sandbox. Each Module will create a new ClassLoader for it, and Sandbox itself will also create a new ClassLoader for it. In this way, we have completed the isolation between Sandbox and Module, between Module and Module, and between them and the target application.

Communication actually means that we will inject a Spy class into the Boots trap ClassLoader. This Spy class is responsible for the communication between the target application and the Sandbox, which is not particularly intuitive.

How to achieve dynamic plug-in

When talking about how to achieve dynamic pluggability, Xu Dongchen summarized it in one sentence: the transform method deforms the native bytecode, and the event monitoring table manages the module. Why do we need this? In fact, no matter for a system, when we attach something to the system, what we care most about is whether we can restore it and have the ability to recover it. After you add some things, you add Sandbox and these modules, how does it work on my system, where does it deform, how does it work, and can my system be restored.

In this case, this picture actually shows such a thing. Let's first look at where our deformation occurs, filter the classes loaded by JVM (the filter is notified to the sandbox by the module), and find the class that needs to be deformed. After getting the class I want to deform, it will pass through a deformation channel. What happens on the deformation channel and what deformations are there are determined by the modules loaded by our sandbox.

This is equivalent to an event monitoring table. This Module has a deformation on this class. What will happen if I add a new Module? All classes will be filtered again, and the deformation will be reloaded for the Module. If I reduce a Module. Similarly, I need to filter out the classes specified by the Module first, and then perform the deformation. In this way, from this we can see that if I uninstall all the modules on the Sandbox, the entire channel will not be deformed. If there is no deformation, it is a class and becomes this number, and then becomes a class. In fact, it is not deformed, and the entire code is actually restored.

When using Sandbox, if you only mount Sandbox, it will not affect your original code. If you mount Module based on Sandbox, Module determines which classes and methods you affect. When you uninstall a Module, the entire deformation disappears. This is done by dynamic pluggability.

As shown in the figure above, this is the overall architecture of JVM-Sandbox. The bottom layer is built on the JVMTI architecture system, and some code weaving frameworks are made. We can weave around method calls, intervene in method flows, weave method paths, and other processes. The sandbox will distribute events, listen to events, deregister events, and handle some events. This is actually done. We have completed the module management, and we will do some module management work at the upper level.

Let's look at this. The extra part is actually in the Sandbox. It has an HTTP server. Its function is to control your module when it needs to be mounted, unmounted, activated, or started after the entire Sandbox is suspended. At that time, a more convenient way was to use HTTP to control it, so an HTTP server was added. So after you mount the Sandbox, the upper-level modules can actually be controlled through HTTP requests to control its startup, unloading, and loading.

Sandbox itself is open source, and all its source code is available. We hope that more students can think of more application scenarios and open source them for everyone to use.

The speeches of the speakers at this WOT Summit are compiled and edited by 51CTO. If you want to know more, please log in to WWW.51CTO.COM to view them.

[51CTO original article, please indicate the original author and source as 51CTO.com when reprinting on partner sites]

<<:  WOT Cheng Chao: Alibaba's monitoring development path from automation to intelligence

>>:  New challenges for operation and maintenance in 2018: Three experts tell you how to achieve intelligent operation and maintenance

Recommend

The impact of 5G technology on these 20 industries

5G, a new era product developed in response to th...

IPv6 series - 10 common problems for beginners

Based on the problems encountered by myself and m...

Ethernet cables: A billion-dollar market, but growth will be hampered

[[177568]] Allied Market Research forecasts that ...

Hostodo: $12/year KVM-256MB/15GB/500GB/Spokane (Washington)

Hostodo launched a new server in the middle of th...

What is Open RAN?

Open RAN is an industry standard designed to powe...

What is 6G and when can we expect it?

Since 5G networks are still being deployed around...