How to apply code intelligence technology to daily development?

How to apply code intelligence technology to daily development?

01/ Let’s start with the developers’ worries

When developers write code, they need to spend a lot of time on low-level repetitive coding , especially for some development languages ​​with redundant syntax.

At the same time, developers are often jokingly called search engine-oriented programming , because we often need to use general search engines to query document information, but the content quality of general search engines varies greatly, and developers need to spend a lot of time finding and selecting documents, which will fragment our development process and make it impossible to concentrate on the development of business logic.

During the code review phase, reviewers need to conduct manual reviews, which is time-consuming and laborious. Especially when there are a lot of code changes, it is difficult for reviewers to complete the review task and they can only take a rough look, which goes against the original intention of the review . During the review process, traditional code detection tools cannot discover deep potential defects, nor can they provide defect repair solutions, which lays hidden dangers for online failures .

Problems in the development process

So how should we solve these many troubles?

02/ How code intelligence empowers daily development

The Alibaba Cloud Code Intelligence team has created a number of industry-leading intelligent coding and code detection tools through AI technology. It is the first team in China to apply AI capabilities to code review scenarios. In addition, we also maintain the advancement of technology through in-depth cooperation with the academic community, through papers and patents. Let us introduce it in detail below.

Code intelligent completion capabilities and principles

When writing code, Yunxiao Codeup can provide developers with intelligent coding assistance through WebIDE to quickly complete lightweight coding. It also allows developers to quickly find the required code documents or code examples through language descriptions, reducing the fragmentation of the coding process.

When writing code, you only need to enter a few characters, or even one character, and the code smart completion plug-in can combine the code context and its semantics to recommend multiple full-line code completion candidates. The icon in front is the result provided by the code smart completion plug-in. Enter the same character, such as the character X, in different lines of code, and it can recommend line-level code completion results that are more suitable for the current location. It can also automatically fill in the variables or parameters that appear in the code above into the appropriate code completion candidates.

Intelligent code completion can help developers reduce repetitive low-level coding and greatly improve coding efficiency. Take the code snippet demonstrated in the video as an example.

If you just use the built-in code completion of the IDE, you need to type 700 times and spend 5 minutes to complete the code;

Using a leading product of its kind in the industry, the number of keyboard inputs was reduced by 33% and the time spent was reduced by 6%. Why did the number of keyboard taps decrease so much, but the time spent was reduced so little? Because it provides too many code completion candidates and there are many errors, requiring developers to spend time to make choices. Too many choices is not a good thing.

Using the code intelligent completion plug-in independently developed by our Yunxiao, the number of keyboard inputs is reduced by 65%, the coding time is reduced by 57% , and the code writing can be completed in about 2 minutes.

How did we do it ? We used multi-model fusion technology to fuse together multiple models with different strengths. One person’s decision may not be accurate, but multiple people can make decisions together to reduce misjudgments as much as possible. The deep learning model and semantic model can perceive the code context. When the same character is entered, different code completion results will be recommended at different code locations, and the variables or parameters that appear in the above text can be automatically filled into the completion candidates.

According to the usage of Alibaba internal developers, compared with the built-in completion of IDE, code intelligent completion can help developers improve their coding efficiency by 20% on average. So what is the implementation principle of code intelligent completion ?

We will parse the code into an abstract syntax tree and perform data processing on the AST. We will train the processed data through a deep learning model. In the first few iterations, we can only generate some messy sequences. The model will compare the generated sequence with the expected sequence, calculate the error and correct the model data. After N iterations, it can generate grammatically correct code sequences. In the stage of intelligent code completion, we integrate multiple different types of models such as deep learning models, semantic models, and statistical models to generate candidates for code completion, and then use syntax correction to avoid recommending grammatically incorrect code to developers.

Intelligent Code Review In code review , when developers create a review, Yunxiao Codeup will recommend a more suitable reviewer who may be more familiar with the changed code. The reviewer can also see the estimated time of each review on the review list, helping the reviewer to make full use of fragmented time for review. At the same time, when browsing the review, developers often need to view the definition or reference of a certain API, and the syntax jump service we provide allows reviewers to jump to code definitions and references on the web page just like in the IDE. In addition, we also provide a deeper code detection tool to help reviewers find hidden defects faster and fix them quickly.

Intelligent Code Security Detection Regarding code detection, here we mainly talk about code content security detection. When it comes to code security, many vulnerabilities are discovered every year and exploited by hackers. For example, the vulnerability of the Struts framework file upload module in the early years allowed hackers to remotely execute Shell commands; recently, for example, Chrome's zero-day vulnerability has a Use-After-Free problem, which allows hackers to execute remote code in the rendering process. For example, if a user opens a PDF file in Chrome, the hacker can obtain relevant user data through remote commands.

To this end, Yunxiao Codeup provides developers with code content security detection tools such as dependency package vulnerability detection and source code vulnerability detection .

Dependency package vulnerability detection

Dependency package vulnerability detection can help developers discover vulnerabilities in third-party packages. Most third-party packages are open source software, and open source software rarely undergoes security testing. Hackers are also more willing to discover vulnerabilities in open source software because the code is open source and it is easier to find security vulnerabilities. Once a security vulnerability is discovered, its impact will be very large. Most applications that reference third-party packages with security vulnerabilities will be threatened by hacker attacks.

Yunxiao Codeup's dependency package vulnerability detection tool will first compile and build the code, collect all the dependent packages of the code, and then query accurate vulnerability information from the vulnerability library through the vulnerability matching algorithm. In order to make the vulnerability library more comprehensive, we have integrated multiple external vulnerability libraries and the vulnerability library built by the Alibaba Group Security Team. The vulnerability information obtained by developers will include the recommended version range for upgrade. In order to reduce the impact of the dependency package version upgrade on the application stability, we will perform validity and compatibility analysis on each dependency package version, and then recommend the recommended upgrade version number for developers, and provide a quick entry to repair dependency package vulnerabilities by creating a code review with one click.

Source code vulnerability detection

In the field of code content security testing, in addition to the dependency package vulnerability detection just mentioned, we also provide source code vulnerability detection tools for the code library itself.

Yunxiao Codeup is based on the source umbrella detection engine. It converts the data flow and control flow in the code into mathematical language, and then proves the theorem in the mathematical language, which can more accurately deduce the path conditions in the code and reduce false positives. At the same time, it has the ability to analyze the full text across functions. For example, if there are multiple layers of function call relationships, if there are some code security risks in the bottom-level functions and it will affect the top-level business code, it can also parse the data flow and function call relationship of the code into a graph structure. By analyzing the code graph, it can quickly discover potential security risks. In addition, we will also give a detailed explanation of the detected vulnerability information, telling developers how each vulnerability has an impact in the code step by step.

After we submit the code, Yunxiao Codeup can automatically execute the enabled code detection, such as the dependency package vulnerability detection enabled in the video. We can also manually enable source code vulnerability detection.

Dependency package vulnerability detection can discover many third-party package vulnerabilities and will display suspected CVE vulnerability information in the detailed information; source code vulnerability detection can discover security vulnerabilities such as code injection, remote command execution, buffer overflow, etc., and will display the impact path of the vulnerability in the code on the right side of the detailed information.

We can create a code review with one click in the detailed information of the dependency package vulnerability, which helps us quickly generate code changes and merge requests to fix the specified vulnerability. The review description will provide information about the vulnerability and compatibility analysis of the upgrade package.

03/ Continuous exploration of code intelligence technology

In addition to the intelligent capabilities that have been implemented in Yunxiao mentioned above, we have conducted in-depth cooperation with Zhejiang University, Monash University, Nanyang Technological University and other universities in the fields of code generation, code search, intelligent review, etc., and have produced papers and patents in many fields.

For example, in the code summary generation project we collaborated with Nanyang Technological University, developers often do not like to write comments or do not know how to write comments, which leads to low maintainability of the code. In order to help developers better understand the code, we hope to automatically generate summary comments for code snippets by learning and understanding the code logic. We will first mine a batch of code snippets and their comments from the code big data, build a code snippet retrieval library, and then find code snippets similar to the target code from the retrieval library, and parse the target code and similar code snippets into CPG graph structures. We will fuse the two graph structures together through the fusion algorithm based on the attention mechanism, and then obtain the encoding data of the graph information through static and dynamic calculations of the graph weights through the model. Finally, after encoding the summary comments of similar code snippets, they are aggregated with the encoding information of the graph structure, and the decoder can generate the summary comments of the target code. Related paper "RETRIEVAL-AUGMENTED GENERATION FOR CODE SUMMARIZATION VIA HYBRID GNN".

In addition to writing code, developers spend a lot of time debugging code. We hope to assist developers in troubleshooting defects through code defect location. Code defect location will first mine valid defective code from the code change data as a training set, and parse the defective code into an abstract syntax tree. The abstract syntax tree is then disassembled by code line. Each line of code corresponds to a sub-syntax tree, and the sub-syntax tree is encoded from the leaf node to the root node into multiple sub-paths. Finally, the sub-paths associated with each line of code are put into a deep learning model based on the attention mechanism for training. When code defect location is required, after the developer submits the code, we will extract the sub-path of the changed code line from the code change. Through model reasoning, we can obtain the defect probability of each line of code, thereby assisting developers in troubleshooting defects.

In addition to the two cooperation projects mentioned above, we have carried out extensive cooperation with various universities in many fields. We hope that in the near future, developers will only need to provide text descriptions or requirement documents on the Yunxiao intelligent R&D platform, and we will be able to generate most of the basic code and its dependencies for them, and quickly complete the code filling through intelligent coding assistance, and code defects will be nowhere to hide. We hope that everyone can keep an eye on the dynamics of Yunxiao products.

<<:  The United States lost 5G simply because it made the wrong choice at the beginning

>>:  One-stop integration of ecology and scenarios? Let’s take a look at the “sharing” of mini programs

Recommend

...

Launchvps: $19.95/year KVM-768MB/20GB/768GB/Philadelphia Data Center

Launchvps recently launched two special annual pa...

What Every Developer Should Know About TCP

Why do you need to place servers geographically c...

5G network equipment security assessment escort "new infrastructure"

Unlike 4G mobile communication technology, which ...

Let's talk about UPNP and DLNA protocols

Preface There is no love, only technology. Let me...

This picture explains the principle of 5G

5G is getting closer and closer to us. On the 18t...

Top 9 bandwidth monitoring tools for enterprise networks

【51CTO.com Quick Translation】Bandwidth usage is o...

Three major development trends of outdoor wireless networks in 2021

As the extraordinary year of 2020 draws to a clos...

Enterprise Network Data Communication Solution Practice - EIGRP

Practical objectives: Through practical applicati...