How to handle errors and exceptions in large software

How to handle errors and exceptions in large software

【51CTO.com Quick Translation】 "I didn't find any bugs in my testing, so that means there are no bugs...right?" Don't think so. Due to the high complexity of large software, no matter how much testing you do, it is impossible to achieve zero bugs. Because you can't guess all the ways users use your application, it is very important to understand the difference between errors and exceptions in your application. To this end, you have to choose the right way to handle these errors and exceptions, take a proactive attitude to ensure the normal operation of the application, and be responsible to your development team and end users.

The test itself is also a problem

Even if your testing is the most thorough, you are still only testing a specific situation, and your own biases play a role in the testing process, making the objectivity of the test itself distorted.
Imagine thousands of users using your application at the same time, operating in thousands of ways, and there will definitely be situations that you have not tested.

[[192326]]


How to properly handle errors in your application

Simply put, bugs can cause errors and exceptions, which have different meanings under different conditions, depending on which aspect of the problem you are more concerned about. The main issue is, of course, how to better handle these errors and exceptions so that they do not bring negative consequences.

First, let’s look at some definitions and why these differences are important.

What is the difference between errors and exceptions?

Some programming languages ​​have their own definition of errors and exceptions, but I want to define the difference.

Let's talk about errors first. Programming errors are usually impossible to continue and recover from, requiring the programmer to go into the program and modify the code to fix it. Sometimes errors are converted into exceptions so that they can be handled in the code. Errors can be avoided by simple checks, and if some simple checks are not enough, errors can also be converted into exceptions to handle them, so that the application can solve the problem and keep running at least.

Now let's talk about exceptions. When exceptions occur, we need to consider the characteristics of different programming languages. Exceptions can be ignored or caught, so the code can recover and handle these situations without putting the application into an "error" state. Since exceptions can be ignored, the application can still keep running at this time, and unhandled exceptions (which are errors) can also be logged, so it is up to the developer to handle these potential exceptions. Let's look at some examples.

Case 1: A user error

When the user enters incorrect data, you may not need to deal with it too much for the time being, but it may still cause errors and unrecoverable states in the program. Undoubtedly, the code should perform simple checks to prevent the occurrence of error states. You should perform front-end and back-end verification and throw an exception as a "last resort defense".

Case 2: File cannot be opened/download is abnormal

This is a special case that will not break your entire application. Your application should be able to handle this. There are many reasons why downloads fail, so be prepared for them when setting up your program. Ok, that's the difference between errors and exceptions I defined. This is an easy-to-follow process to help you handle errors better.


Pay attention to every exception

"If I catch every exception, my code will be error-free, right?"

As I mentioned earlier, not all errors result in exceptions. The main problem with this conclusion is that you don't know what is wrong. There may be something wrong with your code, and by catching the exception and not doing anything, you will lose this information. Don't just look for exceptions and then go on with your business as usual. The purpose of looking for exceptions is to handle them and create a more suitable environment for operation.

How to recover the application code

Throwing and catching exceptions is a great way to let your application recover and prevent it from running into error states. If you know what exceptions might be thrown, it's important to know which exceptions, if caught, will cause your application to stall. (We talked a little bit about software architecture error reporting.)

When it comes to specific exception types, you can collect feedback from users so that you know exactly what caused the program to fail and can better handle these situations.

Why is it important to specify the type of exception that is caught?

As your program runs, certain exceptions can corrupt data or behave in an unnatural way. This can cause your application to fail. If you know exactly which exception occurred, you should know what steps to follow to recover. Or, if you can't recover, you should know how to handle the situation well.

So, can this be recovered? Many times, the exception has enough information to know that something went wrong, and in the case of a caught exception, it is sometimes possible to recover from the error state. You can do this by fixing some data, re-fetching data, or even asking the user to try again.

You can catch exceptions, but sometimes your program still won't run because the data you depend on has been corrupted in an unrecoverable way, or the exceptions need to be handled in a different way.

For example, an out of bounds array exception, how does the program recover from it? This is an example of turning an error into an exception. Your application expected the data to exist in a certain way, but this did not happen. While recovery is not always possible, it is now possible to not enter the error state and handle the situation smoothly. If the exception occurs during logging, the developer can fix the exception by adding some simple checks before the array is accessed or changes are made to it.

How to handle unhandled exceptions

There are some exceptions you don't want to happen, such as errors in your code. You can log those exceptions that are not caught by the code, and many languages ​​provide this method of handling exceptions. (For example, .NET's application_error and javascripts global on_errorhandler). Any unhandled exception will appear as an error, and errors cannot be fixed by the code itself. Therefore, logging these errors will make it easier for you to find out the cause. In this way, errors will not be ignored as exceptions. Once these exceptions appear, you can quickly resolve them.

Error Log

Error logs can help us catch errors. With error logs, you can view these recorded errors and exceptions, which is also the key to debugging, and you can also prioritize which errors to fix and when. You don't have to rely too much on screenshots and descriptions sent by users, not to mention that not all users will be interested in reporting errors. Error logs can keep your team active and once errors are discovered, they can contact users in time to protect them from harm. Users will also be happy to receive your reminders, which can also improve your customer relationships. Of course, the most important thing is to solve these problems before users use them.

For example, a code error that causes billing errors is much more serious than a bug that fails to display a specific detail page, even though that error is more likely to occur. When your application fails, you want to find a way to fix it, but only 1% of users will actively report errors, and there are many more errors that you don't know about that are lurking.

Some solutions

Writing some code to save exceptions and stack traces, saving them to files or sending them via email, can alert you to these errors. For example, one user may encounter many exceptions in a run, while a hundred users may encounter some less frequent errors. Which one is more important? Without knowing the specific error situation, the error that affects more users is more important.

Use the stack trace from the exception to help you figure out where the error is, and you should be able to copy or read the code to understand why it went wrong. Sometimes this isn't enough and the problem needs further tracing. If this happens, add more information to the exception before logging in, including context-specific details (like an account ID or specific object status) that will allow you to replicate the error locally. Now you should be able to find all errors and exceptions, and log the unhandled ones.

Depending on the size of your application, the noise of the errors can be a problem. You can do some clever things with email filtering to help you group the errors, but this is only a partial solution. I did this a few years ago, but quickly realized that there were too many problems and this was only a partial solution.

The problem was that I still had no idea which errors were most impactful to users. I was focusing on the errors that were thrown the most, rather than the errors that were the most annoying to the user experience. Because of this, I never really had a good idea of ​​which errors were more severe. I had no visual representation of what was happening, but had to run manual queries to figure it out, which was quite time consuming.

Errors and exceptions are very common in large software. Correct error handling will be used as a basis for judging a team. It is also a process of breaking through errors and exceptions and creating a beautiful operating environment. Good applications contain code to recover from exceptions when possible. Handling and logging exceptions is very important to the health of your software!

Translated by Liu Nina
Original link: https://dzone.com/articles/how-to-handle-errors-and-exceptions-in-large-scale

[Translated by 51CTO. Please indicate the original translator and source as 51CTO.com when reprinting on partner sites]

<<:  After the video industry cools down rationally, who is the most popular CDN?

>>:  Faced with Apple's competition for the live streaming platform, Yunfan accelerates the launch of the first H.265 live video transmission solution

Recommend

In the 5G era, what is the United States worried about?

Today I want to talk to you about a technical top...

Huawei obtains the world's first PUE test certificate for micro-module products

The 4th Data Center Infrastructure Summit was suc...

Can the 5G industry really succeed?

[[419120]] Last month, the Ministry of Industry a...

The latest analysis of WiFi 6E and WiFi 7 market!

WiFi has been expanding its deployment and applic...

How Should Operators Carry Out Cross-industry Integration?

According to the information disclosed by the 201...

Experts discuss: How will 5G accelerate after the epidemic?

It was supposed to be a time to get rid of the ol...

WOT Xu Dongchen: JVM-Sandbox Non-intrusive runtime AOP solution based on JVM

[51CTO.com original article] On May 18-19, 2018, ...