The Timeline of Errors

As software developers, we see errors every day. They manifest as exceptions or segfaults or error codes, telling us that our code has gotten into a state we didn't expect. Their appearance often portends bugs. Though we groan at an unexpected stack trace, we should see an error as a form of automated feedback. Feedback can be fast or slow. We can put the discovery of errors onto a timeline.

A timeline of errors. From left to right: Compile time, Runtime, Use time, Log time, Ignore time

Errors appear at many points over the lifetime of the code, starting at the moment it is compiled. The further to the right that an error appears, the longer it takes for the feedback to appear.

Compile time

Compile time is the earliest we can receive feedback about an error. The compiler automatically does a number of checks to make sure the code makes some semblance of sense. The most powerful tool for compile-time feedback is the type-checker. The type-checker makes sure that only values of the expected type are passed around to the places that expect them. This guards our code from trying to handle, say, a string of text when it expects a number. We can also leverage the type system with small types to verify higher level logic. Not all languages have a compiler, or support static typing. These languages lose the benefits from a compilation step, but can regain some by using static analysis, linting, and type hints.

Compile time error detection is powerful because of how quickly we get the feedback. The code doesn't even get committed. No customer will ever see the error. Not even other teammates will see it. An IDE that automatically compiles code saves the step of actually running the compiler, giving this feedback right in the editor.

Runtime

Many errors are found when the code runs. I lump in automated testing with runtime, since we see the errors by actually running the code. Good runtime errors will tell you exactly what happened where. Bad ones will be more mysterious. Runtime errors come from bad code. There's a case we didn't expect, or a value we didn't handle. Since the code has gotten into a bad state, a runtime error is the best way to get out without breaking things more. It is futile to call a method on a null value, for example, and the code shouldn't try to continue.

Runtime errors take longer to appear than compile time errors. They aren't guaranteed to happen at all. A special, unhandled case may not occur for years. Or a special case may only become possible due to a change in another part of the code. However, when they do happen, the feedback is immediate from that moment. Because of the certain uncertainty of runtime errors, many frameworks and applications have built-in ways of safely handling and reporting them.

Use time

Not all errors nicely present as compile or runtime errors. Many errors happen at the time of use, and are only detected by users as invalid data or broken functionality. These errors happen because the code gets into a bad state, but continues anyway. Use time errors can lead to data corruption and loss of functionality. A form failing to load and only showing a blank screen is a use time error. An update to a shipping speed not getting saved is a use time error. A transaction that deducts from one account but does not add to the other is a use time error.

Use time errors can be caught by exhaustive manual testing. Just as often, and more embarrassingly, they are caught by end users. Worse, because use time errors have made it all the way to production, it can be difficult to trace back to the original change that caused the problem. Use time errors are often latent. The error may not appear where the problem lives. Rather, it causes another error, like a runtime error, in a different, seemingly unrelated part of the code. Because of this, I think of them as little ticking time bombs waiting to explode at just the wrong time.

Log time

Even end users won't detect every error, despite their best efforts. Sometimes we write code that is so good at hiding its faults that we can only detect indescretions by reading log messages. I know no developer that spends their weekends gleefully reading application debug logs. I know plenty that have buried important indicators in those logs. If an error is logged at a sufficiently high level, it might get noticed by monitoring software and bubbled up to the development team. Since there is no standard for what constitutes an error versus a warning versus info, we can't rely on this happening.

Unless we are willing to stare at a screen of scrolling log messages every minute of the day, we must rely on automated monitoring to detect log time errors. More often, log time errors are found forensically once an issue is discovered via a different route. At that point, we hope they are sufficient to reconstruct the invalid state.

Ignore time

Sometimes, we hide errors so well, we don't even know they are happening. We've ignored any way to give an indication that something is wrong. This is the most dangerous situation. Errors happen, nothing tells us, and we can't figure out what happened after the fact. We should never be consciously ignoring an error.

Find errors better by finding them earlier

The longer our feedback cycles, the more costly it is to fix a defect that an error exposes. Therefore, we should strive to move our errors as far to the left as we can. This way, we will see them earlier and fix them faster. For example, dereferencing a null pointer is a runtime error. We can move this to a compile time error by using optional or nullable types. Then, the compiler will complain to us if we don't handle the missing value. We can move log time errors earlier by having the user confirm the output, or by automatically verifying the expected end state. We get feedback for each of these kinds of errors that much faster.

As we want to move errors to the left, there is a natural entropy that pushes them back to the right. Using the example of replacing null values with optionals, we could do optionalValue.get() in Java. The compiler made sure we handled the missing value, but the way we handled it moved it back into the realm of runtime errors. This tension will always exist, so we must be diligent to keep errors where we want them on the timeline.

Comments

Popular posts from this blog

Magic Numbers, Semantics, and Compiler Errors

Assuming Debugging