Explore Java application startup speed optimization

Explore Java application startup speed optimization

[[418030]]

1. Can you have both high performance and fast startup speed?

As an object-oriented programming language, Java is unique in its performance.

The report "Energy Efficiency across Programming Languages, How Does Energy, Time, and Memory Relate?" investigates the execution efficiency of major programming languages. Although the richness of the scenarios is limited, it can also give us a glimpse of the big picture.

From the table, we can see that Java has a very high execution efficiency, about half of the fastest C language. This is second only to C, Rust and C++ among mainstream programming languages.

Java's excellent performance is due to the excellent JIT compiler in Hotspot. Java's Server Compiler (C2) compiler is the work of Dr. Cliff Click, and uses the Sea-of-Nodes model. This technology has also proven over time that it represents the most advanced level in the industry:

The famous V8 (JavaScript engine) TurboFan compiler uses the same design, but implements it in a more modern way;
When Hotspot uses Graal JVMCI for JIT, the performance is basically the same as C2;
Azul's commercial product replaces the C2 compiler in Hotspot with LLVM, and its peak performance is on par with C2.
Behind the high performance, Java's poor startup performance is also impressive, and most people's impression of Java as clumsy and slow comes from this. High performance and fast startup speed seem to be somewhat contradictory. This article will explore with you whether the two can be achieved at the same time.

2. The root cause of slow Java startup

1. Complex framework

JakartaEE is the new name after Oracle donated J2EE to the Eclipse Foundation. When Java was launched in 1999, the J2EE specification was released. EJB (Java Enterprise Beans) defines the security, IoC, AOP, transaction, concurrency and other capabilities required for enterprise-level development. The design is extremely complex, and the most basic applications require a large number of configuration files, which is very inconvenient to use.

With the rise of the Internet, EJB was gradually replaced by the more lightweight and free Spring framework, and Spring became the de facto standard for Java enterprise development. Although Spring is positioned as more lightweight, it is still largely influenced by JakartaEE, such as the use of a large number of XML configurations in early versions, a large number of JakartaEE-related annotations (such as JSR 330 dependency injection), and the use of specifications (such as JSR 340 Servlet API).

But Spring is still an enterprise-level framework. Let's look at some of the design philosophies of the Spring framework:

By providing options at every level, Spring lets you defer choices as long as possible.
Spring is flexible and does not force you to choose the right option. It supports a wide range of application requirements from different perspectives.
Maintain strong backward compatibility.
Under the influence of this design philosophy, there must be a lot of configurable and initialization logic, as well as complex design patterns to support this flexibility. Let's take a look at this through an experiment:

We run a spring-boot-web helloword, and we can see the dependent class files through -verbose:class:

  1. $ java -verbose: class -jar myapp- 1.0 -SNAPSHOT.jar | grep spring | head -n 5 [Loaded org.springframework.boot.loader.Launcher from file:/Users/yulei/tmp/myapp- 1.0 -SNAPSHOT.jar][Loaded org.springframework.boot.loader.ExecutableArchiveLauncher from file:/Users/yulei/tmp/myapp- 1.0 -SNAPSHOT.jar][Loaded org.springframework.boot.loader.JarLauncher from file:/Users/yulei/tmp/myapp- 1.0 -SNAPSHOT.jar][Loaded org.springframework.boot.loader.archive.Archive from file:/Users/yulei/tmp/myapp- 1.0 -SNAPSHOT.jar][Loaded org.springframework.boot.loader.LaunchedURLClassLoader from file:/Users/yulei/tmp/myapp- 1.0 -SNAPSHOT.jar]$ java -verbose: class -jar myapp- 1.0 -SNAPSHOT.jar | egrep '^\[Loaded' > classes$ wc classes 7404     29638   1175552 classes

The number of classes reaches an astonishing 7404.

Let's compare the JavaScript ecosystem and write a basic application using the commonly used express:

  1. const express = require( 'express' ) const app = express()app.get( '/' , (req, res) => { res.send( 'Hello World!' )}) app.listen( 3000 , () => { console.log(`Example app listening at http: //localhost:${port}`)})  

We use Node's debug environment variable analysis:

  1. NODE_DEBUG=module node app.js 2 >& 1 | head -n 5MODULE 18614 : looking for   "/Users/yulei/tmp/myapp/app.js" in [ "/Users/yulei/.node_modules" , "/Users/yulei/.node_libraries" , "/usr/local/Cellar/node/14.4.0/lib/node" ]MODULE 18614 : load "/Users/yulei/tmp/myapp/app.js"   for module "." MODULE 18614 : Module._load REQUEST express parent: .MODULE 18614 : looking for   "express" in [ "/Users/yulei/tmp/myapp/node_modules" , "/Users/yulei/tmp/node_modules" , "/Users/yulei/node_modules" , "/Users/node_modules" , "/node_modules" , "/Users/yulei/.node_modules" , "/Users/yulei/.node_libraries" , "/usr/local/Cellar/node/14.4.0/lib/node" ]MODULE 18614 : load "/Users/yulei/tmp/myapp/node_modules/express/index.js"   for module "/Users/yulei/tmp/myapp/node_modules/express/index.js" $ NODE_DEBUG=module node app.js 2 >& 1 | grep ': load "' > js$ wc js 55       392      8192 js

There are only 55 js files relied on here.

Although it is not fair to compare spring-boot and express. In the Java world, applications can also be built based on more lightweight frameworks such as Vert.X and Netty, but in practice, almost everyone will choose spring-boot without hesitation in order to enjoy the convenience of the Java open source ecosystem.

2. Compile once, run everywhere

Is Java's slow startup due to a complex framework? The only answer is that the complexity of the framework is one of the reasons for the slow startup. By combining GraalVM's Native Image function with the spring-native feature, the startup time of the spring-boot application can be shortened by about ten times.

Java's slogan is "Write once, run anywhere" (WORA), and Java has indeed achieved this through bytecode and virtual machine technology.

WORA enables developers to quickly deploy applications developed and debugged on MacOS to Linux servers. The cross-platform nature also makes the Maven central repository easier to maintain, contributing to the prosperity of the Java open source ecosystem.

Let's look at the impact of WORA on Java:

Class Loading
Java organizes source code through classes, which are stuffed into JAR packages for organization into modules and distribution. A JAR package is essentially a ZIP file:

  1. $ jar tf slf4j-api- 1.7 . 25 .jar | headMETA-INF/META-INF/MANIFEST.MForg/slf4j/org/slf4j/event/EventConstants.classorg/slf4j/event/EventRecodingLogger.classorg/slf4j/event/Level. class  

Each JAR package is a relatively independent module in terms of functionality. Developers can rely on JARs with specific functions as needed. These JARs are known to the JVM through the class path and loaded.

According to the JVM, class loading is triggered when the new or invokestatic bytecode is executed. The JVM will hand over control to the Classloader. The most common implementation URLClassloader will traverse the JAR package to find the corresponding class file:

  1. for ( int i = 0 ; (loader = getNextLoader(cache, i)) != null ; i++) { Resource res = loader.getResource(name, check); if (res != null ) { return res; }}

Therefore, the cost of searching for classes is usually proportional to the number of JAR packages. In large applications, the number can be thousands, resulting in a high overall search time.

After finding the class file, the JVM needs to verify whether the class file is legal and parse it into an internally usable data structure, which is called InstanceKlass in the JVM. I have heard of javap to take a peek at the information contained in the class file:

  1. $ javap -p SimpleMessage.classpublic class org.apache.logging.log4j.message.SimpleMessage implements org.apache.logging.log4j.message.Message,org.apache.logging.log4j.util.StringBuilderFormattable,java.lang.CharSequence { private   static   final   long serialVersionUID; private java.lang.String message; private   transient java.lang.CharSequence charSequence; public org.apache.logging.log4j.message.SimpleMessage(); public org.apache.logging.log4j.message.SimpleMessage(java.lang.String);

This structure contains interfaces, base classes, static data, object layout, method bytecodes, constant pools, etc. These data structures are necessary for the interpreter to execute bytecodes or JIT compilation.

Class initialize

When the class is loaded, it must be initialized before it can actually create an object or call a static method. Class initialization can be simply understood as a static block:

  1. public   class A { private   final   static String JAVA_VERSION_STRING = System.getProperty( "java.version" ); private   final   static Set<Integer> idBlackList = new HashSet<>(); static { idBlackList.add( 10 ); idBlackList.add( 65538 ); }}

The initialization of the first static variable JAVA_VERSION_STRING above will also become part of the static block after being compiled into bytecode.

Class initialization has the following characteristics:

Execute only once;
When multiple threads try to access a class, only one thread will perform class initialization, and the JVM ensures that other threads will be blocked waiting for initialization to complete.
These features are very suitable for reading configurations, or constructing some data structures and caches required for runtime, so the initialization logic of many classes will be more complicated.

Just In Time compile
After a Java class is initialized, it can instantiate objects and call methods on the objects. Interpreted execution is like a large switch..case loop, with poor performance:

  1. while ( true ) { switch (bytocode[pc]) { case AALOAD: ... break ; case ATHROW: ... break ; }}

We use JMH to run a Hessian serialization Micro Benchmark test:

  1. $ java -jar benchmarks.jar hessianIOBenchmark Mode Cnt Score Error UnitsSerializeBenchmark.hessianIO thrpt 118194.452 ops/s$ java -Xint -jar benchmarks.jar hessianIOBenchmark Mode Cnt Score Error UnitsSerializeBenchmark.hessianIO thrpt 4535.820 ops/s

The -Xint parameter in the second run controls us to use only the interpreter. The difference here is 26 times, which is caused by the difference between direct machine execution and interpreted execution. This difference is very related to the scenario. Our usual experience is 50 times.

Let's take a closer look at the JIT's behavior:

  1. $ java -XX:+PrintFlagsFinal -version | grep CompileThreshold intx Tier3CompileThreshold = 2000 {product} intx Tier4CompileThreshold = 15000 {product}

Here are the values ​​of two JDK internal JIT parameters. We will not introduce the principle of tiered compilation for now, you can refer to Stack Overflow. Tier3 can be simply understood as (client compiler) C1, and Tier4 is C2. When a method is interpreted and executed 2,000 times, it will be compiled into C1. When C1 is compiled and executed 15,000 times, it will be compiled into C2, which truly achieves half the performance of C mentioned at the beginning of the article.

When the application is just started, the method has not been completely JIT compiled, so most of the time it remains in interpreted execution, affecting the speed of application startup.

3. How to optimize the startup speed of Java applications

We have spent a lot of time analyzing the main reasons for the slow startup of Java applications. The summary is:

Influenced by JakartaEE, common frameworks are designed to be more complex with considerations of reuse and flexibility;
For cross-platform compatibility, the code is dynamically loaded and compiled, and loading and execution are time-consuming during the startup phase;
The combination of these two factors has resulted in the current situation where Java applications start slowly.

Python and Javascript both parse and load modules dynamically. CPyhton does not even have JIT. In theory, its startup will not be much faster than Java, but they do not use very complex application frameworks, so there will be no overall startup performance issues.

Although we cannot easily change users' usage habits of the framework, we can enhance it at the runtime level to make the startup performance as close to the native image as possible. The OpenJDK official community has also been working hard to solve the startup performance problem. So, as ordinary Java developers, can we use the latest features of OpenJDK to help us improve startup performance?

Class Loading solves the JAR package traversal problem through JarIndex, but this technology is too old and difficult to use in modern projects that include tomcat and fatJar. AppCDS can solve the performance problem of class file parsing.
Class Initialize: OpenJDK9 added HeapArchive, which can persist some Heap data related to class initialization. However, only a few JDK internal classes (such as IntegerCache) can be accelerated, and there is no open way to use them.
JIT warm-up: JEP295 implements AOT compilation, but there are bugs. Improper use will cause program correctness and performance issues. The performance is not well tuned, and in most cases the effect is not visible, and there may even be performance regression.
In response to the problems with the above features of OpenJDK, Alibaba Dragonwell has developed and optimized the above technologies and integrated them with cloud products, so that users can easily optimize the startup time without investing too much effort.

1 AppCDS

CDS (Class Data Sharing) was first introduced in Oracle JDK1.5, and AppCDS was introduced in Oracle JDK8u40, which supports classes outside the JDK, but is provided as a commercial feature. Oracle then contributed AppCDS to the community, and in JDK10, CDS was gradually improved and also supported user-defined class loaders (also known as AppCDS v2).

Object-oriented languages ​​bind objects (data) and methods (operations on objects) together to provide stronger encapsulation and polymorphism. These features are implemented by relying on the type information in the object header, which is the case for both Java and Python. The layout of a Java object in memory is as follows:

  1. +-------------+| mark |+-------------+| Klass* |+-------------+| fields || |+-------------+

mark indicates the state of the object, including whether it is locked, GC age, etc. Klass* points to the data structure InstanceKlass that describes the object type:

  1. // InstanceKlass layout:// [C++ vtbl pointer ] Klass// [java mirror ] Klass// [super ] Klass// [access_flags ] Klass// [name ] Klass// [methods ]// [fields ]...  

Based on this structure, expressions such as o instanceof String can be judged with sufficient information. It should be noted that the InstanceKlass structure is relatively complex, including all methods, fields, etc. of the class, and the methods contain bytecodes and other information. This data structure is obtained by parsing the class file at runtime. In order to ensure security, the legitimacy of the bytecode also needs to be verified when parsing the class (method bytecodes not generated by Javac can easily cause JVM crashes).

CDS can store (dump) the data structure generated by the parsing and verification into a file and reuse it in the next run. This dump product is called Shared Archive, with the suffix jsa (Java shared archive).

In order to reduce the overhead of CDS reading jsa dump and avoid the overhead of deserializing data to InstanceKlass, the storage layout in the jsa file is exactly the same as the InstanceKlass object. In this way, when using jsa data, you only need to map the jsa file to the memory and let the type pointer in the object header point to this memory address, which is very efficient.

  1. Object:+-------------+| mark | +------------------------++-------------+ |classes.jsa file || Klass* +--------->java_mirror| super |methods|+-------------+ |java_mirror| super |methods|| fields | |java_mirror| super |methods|| | +------------------------++-------------+

AppCDS is not able to handle customer class loader

The InstanceKlass stored in jsa is the product of class file parsing. For the boot classloader (the classloader that loads classes under jre/lib/rt.jar) and the system (app) classloader (the classloader that loads classes under -classpath), CDS has an internal mechanism that can skip reading class files and only match the corresponding data structure in the jsa file by class name.

Java also provides a mechanism for users to define custom class loaders. Users can highly customize the logic of obtaining classes by overriding their own Classloader.loadClass() method. For example, obtaining classes from the Internet or dynamically generating them directly in the code are both feasible. In order to enhance the security of AppCDS and avoid obtaining unexpected classes due to loading class definitions from CDS, the AppCDS customer class loader needs to go through the following steps:

Call the user-defined Classloader.loadClass() to get the class byte stream
Calculate the checksum of the class byte stream and compare it with the checksum of the same name structure in jsa. If the match is successful, return the InstanceKlass in jsa, otherwise continue to use the slow path to parse the class file. We have seen that in many scenarios, the first step above occupies the majority of the time spent on class loading, and AppCDS is unable to cope with this. For example:

  1. bar.jar +- com/bar/Bar. class baz.jar +- com/baz/Baz. class foo.jar +- com/foo/Foo. class  

The class path contains the three jars above. When loading class com.foo.Foo, most Classloader implementations (including URLClassloader, tomcat, spring-boot) choose the simplest strategy (premature optimization is the root of all evil): try to extract the file com/foo/Foo.class one by one in the order that the jars appear on disk.

JAR packages use the zip format for storage. Each class load needs to traverse the JAR packages under the classpath and try to extract a single file from the zip to ensure that the existing class can be found. Assuming there are N JAR packages, then on average one class load needs to try to access N/2 zip files.

In one of our real scenarios, N reaches 2000. At this time, the JAR package search overhead is very large and far greater than the InstanceKlass resolution overhead. AppCDS technology is unable to cope with such scenarios.

JAR Index

According to the jar file specification, the JAR file is a format that uses zip packaging and uses text to store meta information in the META-INF directory. This format has been designed to handle the above search scenarios, and this technology is called JAR Index.

Suppose we want to search for a class in the above bar.jar, baz.jar, and foo.jar. If we can immediately infer which jar package it is in through the type com.foo.Foo, we can avoid the above scanning overhead.

JarIndex-Version: 1.0foo.jarcom/foobar.jarcom/barbaz.jarcom/baz
Through the JAR Index technology, the above index file INDEX.LIST can be generated. After loading into memory, it becomes a HashMap:

com/bar --> bar.jarcom/baz --> baz.jarcom/foo --> foo.jar
When we see the class name com.foo.Foo, we can get the specific jar package foo.jar from the index based on the package name com.foo and quickly extract the class file.

The Jar Index technique seems to solve our problem, but it is very old and difficult to use in modern applications:

jar i generates index files based on the Class-Path attribute in META-INF/MANIFEST.MF. Modern projects rarely maintain this attribute. Only URLClassloader supports JAR Index.
Requires that the jar with index appear at the front of the classpath as much as possible
Dragonwell uses agent injection to enable INDEX.LIST to be correctly generated and appear in the appropriate location of the classpath to help applications improve startup performance.

2 types of early initialization

The execution of code in the static block of a class is called class initialization. After the class is loaded, the initialization code must be executed before it can be used (creating an instance, calling static methods).

The initialization of many classes is essentially just constructing some static fields:

  1. class IntegerCache { static   final Integer cache[]; static { Integer[] c = new Integer[size]; int j = low; for ( int k = 0 ; k < c.length; k++) c[k] = new Integer(j++); cache = c; }}

We know that JDK caches a commonly used section in the box type to avoid too much repeated creation, so this section of data needs to be constructed in advance. Since these methods will only be executed once, they are executed in a purely interpreted manner. If we can persist several static fields to avoid calling the class initializer, we can get the pre-initialized class and reduce the startup time.

The most efficient way to load persistent data into memory is memory mapping:

  1. int fd = open( "archive_file" , O_READ);struct person *persons = mmap(NULL, 100 * sizeof(struct person), PROT_READ, fd, 0 ); int age = persons[ 5 ].age;

The C language operates data almost directly on memory, while high-level languages ​​such as Java abstract memory into objects with meta-information such as mark and Klass*. There are certain changes between each run, so more complex strategies are needed to achieve efficient object persistence.

Introduction to Heap Archive

OpenJDK9 introduced the HeapArchive capability, and heap archive was officially used in OpenJDK12. As the name suggests, Heap Archive technology can store objects on the heap persistently.

The object graph is built in advance and put into the archive. We call this stage dump, and using the data in the archive is called runtime. Dump and runtime are usually not the same process, but in some scenarios they can be the same process.

Recall the memory layout after using AppCDS. The Klass* pointer of the object points to the data in SharedArchive. AppCDS persists the metadata of InstanceKlass. If you want to reuse the persistent object, the type pointer of the object header must also point to a piece of persisted metadata. Therefore, HeapArchive technology depends on AppCDS.

In order to adapt to various scenarios, OpenJDK's HeapArchive also provides two levels: Open and Closed:

The above figure shows the allowed reference relationships:

Closed Archive is not allowed to reference objects in Open Archive and Heap. Objects in Closed Archive can be referenced. Objects in Closed Archive are read-only and not writable.
Open Archive can reference any writable object. The reason for this design is that for some read-only structures, placing them in Closed Archive can achieve completely zero GC overhead.

Why read-only? Imagine that object A in the Closed Archive references object B in the heap. When object B moves, GC needs to correct the field in A that points to B, which will incur GC overhead.

Use Heap Archive to initialize classes in advance

With this structure supported, after the class is loaded, the static variable is pointed to the Archived object to complete the class initialization:

  1. class Foo { static Object data;} + | <---------+Open Archive Object:+-------------+| mark | +------------------------- ++ ------------- + | classes.jsa file ||

3 AOT Compilation

Apart from class loading, the first few executions of the method are not compiled by the JIT compiler, and the bytecode is executed in interpreted mode. According to the analysis in the first half of this article, the speed of interpreted execution is about a few tenths of that after JIT compilation. The slow execution of code interpretation is also a major culprit for the slow startup.

Traditional languages ​​such as C/C++ are directly compiled into the native machine code of the target platform. As people become aware of the startup warm-up problem of interpreter JIT languages ​​such as Java and JS, the method of directly compiling bytecode into native code through AOT has gradually come into the public eye.

wasm, GraalVM, and OpenJDK all support AOT compilation to varying degrees. We mainly optimize the startup speed around the jaotc tool introduced by JEP295.

Note the terminology used here:
JEP295 uses AOT to compile the methods in the class file into native code fragments one by one, and replace the method entry with the AOT code after loading a class through the Java virtual machine.
The Native Image function of GraalVM is a more thorough static compilation, through a small runtime SubstrateVM written in Java code, the runtime and application code are statically compiled into an executable file (similar to Go), no longer relying on JVM. This approach is also a kind of AOT, but in order to distinguish the terms, AOT here refers only to the JEP295 method.

AOT feature first experience

With the introduction of JEP295, we can quickly experience AOT

The jaotc command will call the Graal compiler to compile the bytecode and generate the libHelloWorld.so file. The so file generated here can easily lead people to mistakenly think that the compiled library code will be directly called like JNI. However, the ld loading mechanism is not fully used here to run the code. The so file is more like a container for native code. After loading the AOT so, the hotsopt runtime needs to perform further dynamic linking. After the class is loaded, hotspot will automatically associate the AOT code entry and use the AOT version for the next method call. The code generated by AOT will also actively interact with the hotspot runtime, jumping between aot, interpreter, and JIT code.

1) The twists and turns of AOT

It seems that JEP295 has implemented a complete AOT system, but why is this technology not used on a large scale? Among the new features of OpenJDK, AOT has a troubled fate.

2) Multiple Classloader Issues

JDK-8206963: bug with multiple class loaders

This is because the design does not take into account the multi-classloader scenario of Java. When classes with the same name loaded by multiple classloaders use AOT, their static fields are shared, while according to the design of the Java language, this part of the data should be separated.

Since there is no quick fix for this problem, OpenJDK simply added the following code:

  1. ClassLoaderData* cld = ik->class_loader_data(); if (!cld->is_builtin_class_loader_data()) { log_trace(aot, class , load)( "skip class %s for custom classloader %s (%p) tid=" INTPTR_FORMAT, ik->internal_name(), cld->loader_name(), cld, p2i(thread)); return   false ;}

AOT is not allowed for user-defined class loaders. From this, we can see that this feature has gradually lacked maintenance at the community level.

In this case, although the classes specified by class-path can still use AOT, the commonly used frameworks such as spring-boot and Tomcat need to load the application code through Custom Classloader. It can be said that this change cuts off a large part of AOT scenarios.

3) Lack of tuning and maintenance, relegated to experimental features

JDK-8227439: Turn off AOT by default

JEP 295 AOT is still experimental, and while it can be useful for startup/warmup when used with custom generated archives tailored for the application, experimental data suggests that generating shared libraries at a module level has overall negative impact to startup, dubious efficacy for warmup and severe static footprint implications.

To enable AOT from now on, you need to add the experimental parameter:

java -XX:+UnlockExperimentalVMOptions -XX:AOTLibrary=...
According to the description of the issue, this feature has a negative impact on startup speed and memory usage when compiling the entire module. We analyzed the reasons as follows:

The Java language itself is too complex, and runtime mechanisms such as dynamic class loading prevent AOT code from running as fast as expected.
As a phased project, AOT technology has not been maintained for a long time after entering Java 9, lacking the necessary tuning (while AppCDS has been iteratively optimized)

4) Deleted in JDK16

JDK-8255616: Disable AOT and Graal in Oracle OpenJDK

On the eve of the release of OpenJDK16, Oracle officially decided not to maintain this technology:

We haven't seen much use of these features, and the effort required to support and enhance them is significant.

The fundamental reason is that this technology lacks necessary optimization and maintenance. As for the future plans related to AOT, we can only speculate from a few words that there are two technical directions for Java AOT in the future:

Do AOT based on OpenJDK's C2

Support full Java language features on GraalVM's native-image. Users who need AOT will gradually transition from OpenJDK to native-image.
Neither of the above two technical directions can see progress in the short term, so the technical direction of Dragonwell is to make the existing JEP295 work better and bring users the ultimate startup performance.

5) Fast Startup on Dragonwell

Dragonwell's fast startup feature addresses the weaknesses of AppCDS and AOT compilation technologies, and develops a class early initialization feature based on the HeapArchive mechanism. These features almost completely eliminate the time spent on application startup visible to the JVM.

In addition, because the above technologies all conform to the trace-dump-replay usage model, Dragonwell unifies the processes of the above startup acceleration technologies and integrates them into the SAE product.

SAE x Dragonwell: Best Practices for Serverless with Java Startup Acceleration

With good ingredients, you also need matching seasonings and a master chef.

The combination of Dragonwell's startup acceleration technology and the Serverless technology known for its elasticity is more complementary. At the same time, they are jointly implemented in the full life cycle management of microservice applications to play their role in shortening the end-to-end startup time of applications. Therefore, Dragonwell chose SAE to implement its startup acceleration technology.

SAE (Serverless Application Engine) is the first PaaS platform for Serverless. It can:

Java software package deployment: Enjoy microservice capabilities with zero code transformation and reduce R&D costs
Serverless extreme elasticity: resource maintenance-free, rapid expansion of application instances, and reduced maintenance and learning costs

1 Difficulty Analysis

Through analysis, we found that users of microservices face some difficulties in application startup:

Large software package: hundreds of MB or even GB level Many dependent packages: hundreds of dependent packages, thousands of classes
Loading time: Loading dependent packages from disk and then loading classes on demand can account for up to half of the startup time. With the Dragonwell fast startup capability, SAE provides a set of best practices for Serverless Java applications to speed up application startup as much as possible, allowing developers to focus more on business development:

Java environment + JAR/WAR package deployment: Integrate Dragonwell 11 to provide an accelerated startup environment
JVM quick settings: support one-click quick start to simplify operations
NAS network disk: supports cross-instance acceleration, which speeds up the startup of new instances/batch releases when new packages are deployed.

2 Acceleration effect

We selected some typical demos or internal applications of microservices and complex dependent business scenarios to test the startup effect and found that the application can generally reduce the startup time by 5% to 45%. If the application is started and the following scenarios exist, there will be a significant acceleration effect:

Many classes are loaded (spring-petclinic starts loading about 12,000+ classes)

Less reliance on external data

3 Customer Cases

Alibaba Search recommends Serverless platform

Alibaba's internal search recommendation Serverless platform uses a class loading isolation mechanism to deploy multiple businesses in the same Java virtual machine. The scheduling system will deploy business codes to idle containers on demand, allowing multiple businesses to share the same resource pool, greatly improving deployment density and overall CPU usage.

In order to support a large number of different business development operations, the platform itself needs to provide rich enough functions, such as caching and RPC calls. Therefore, each JVM of the search recommendation Serverless platform needs to pull up a middleware isolation container similar to Pandora Boot, which will load a large number of classes and slow down the startup speed of the platform itself. When sudden demand comes in, the scheduling system needs to pull up more containers for business code deployment, and the startup time of the container itself becomes particularly important.

Based on Dragonwell's fast startup technology, the search recommendation platform will perform optimizations such as AppCDS and Jarindex in the pre-release environment, and embed the generated archive files into the container image, so that each container can enjoy acceleration when it starts, reducing the startup time by about 30%.

Fashion brand kills SAE extreme elasticity

An external customer, with the help of Jar package deployment and Dragonwell 11 provided by SAE, quickly iterated and launched a trendy shopping mall App.

When facing big promotions and flash sales, with the extreme elasticity of SAE Serverless and the elasticity of application indicators QPS RT indicators, we can easily meet the demand for rapid expansion of more than 10 times; at the same time, one-click to turn on the Dragonwell enhanced AppCDS startup acceleration capability, reducing the startup time of Java applications by more than 20%, further accelerating application startup, and ensuring smooth and healthy business operation.

5. Conclusion

The quick start technology of Dragonwell is completely based on the work of the OpenJDK community. It has made detailed optimizations and bug fixes for various functions and reduced the difficulty of getting started. This not only ensures compatibility with standards and avoids internal customization, but also contributes to the open source community.

As basic software, Dragonwell can only generate/use archive files on disk. Combined with SAE's seamless integration of Dragonwell, JVM configuration and archive file distribution are automated. Customers can easily enjoy the technical benefits brought by application acceleration.

<<:  What can 5G messaging bring to industry customers?

>>:  How to fight the emergency communication battle in the “golden 72 hours”?

Recommend

...

More than 1,100 projects! These fields are being quietly changed by 5G

The number of terminal connections exceeds 180 mi...

What process resources are shared between threads?

[[357394]] Processes and threads are two topics t...

8 myths about 5G

5G is the next generation of wireless broadband t...

Pour some cold water on the "feverish" 5G concept: the market is far from mature

The popularity of 5G networks, 5G mobile phones, ...

...

Paving the way to a secure and automated multi-cloud with SD-WAN

Enterprises around the world are rapidly transfor...

ENOs and Private LTE: Intelligent Connectivity for Smart Factories

Manufacturing processes and operations are underg...

The "tragic" situation of operators' operations

Previously, a joke mocking the operators caused a...