On Error Handling

Table of Contents

Introduction

I was chatting with a former colleague who is now working primarily in Go. Inevitably, as we were discussing his experiences, the conversation turned to error-handling methodologies. Go seems to have reignited this debate with its love-it-or-hate-it explicit style. Take recent Hacker News articles for instance.

I wasn’t able to fully articulate my opinion at the time, but I hope to do so here. This might be a little long-winded, but bear with me.

Table Of Contents

On Error Handling #

Errors Are A Breach Of Contract #

I’d like to restrict the definition of error for the purposes of this discussion. ¹

An error is a breach in the contract between two components.

If a client tries to withdraw funds when their balance is zero, the withdrawal will certainly fail, but it is unlikely to be considered an error. On the other hand, if the withdrawal failed because we couldn’t read the client’s balance from the database, that’s almost certainly an error (and someone is probably getting paged about it). What’s the difference here?

In the former scenario, the withdrawal logic asked the database for the client’s balance and received it: zero. In the latter, the database logic was unable to provide an answer. In other words, the contract between the withdrawal and database components was broken.

In this case, the withdrawal component must be notified of the failure; otherwise, it might proceed with an uninitialized or incorrect balance value. This notification is the error. It is a message from the database component that it could not fulfill its contract.

Error Handling Is Delegation #

Errors therefore inherently represent either-or scenarios; either the contract was fulfilled, or an error occurred. If an error occurred, the component in question is unable to do its job. It can only delegate back to the requester by notifying it of the issue.

The withdrawal logic may choose to retry, or it could abort the operation entirely. If it aborts, the withdrawal component is itself likely in breach of contract. It must delegate to higher-level program logic to decide how to proceed.

This delegation of responsibility is a core trait of error handling. Errors are propagated to higher levels until they can be dealt with accordingly. This applies at all levels — a single instruction, the larger system, across services, and even beyond the digital world. Imagine that the client was notified of the error by the UI layer, but otherwise no action was taken. You might say that the error was delegated to the user, and they must now handle it by choosing a course of action: perhaps trying again, calling customer support, etc.

The mechanism by which errors are communicated and delegated is error handling.

Error handling is the mechanism by which errors are delegated.

Delegation Requires Context #

Delegation is the heart of error handling, but it requires cooperation.

Although the failed component cannot fulfill its contract, it still has an obligation to the caller: to provide any relevant information that might allow the caller to better understand and handle the error. Without this information, it might be impossible to make an informed decision. If the client doesn’t know that their balance is zero, how would they interpret failure of the withdrawal?

This responsibility is shared by every component in the delegation chain. Each layer may need to provide its own context and/or include all the context it received. This is sometimes called propagation or error chaining.

Because error chaining is such an important requirement of successful delegation, and delegation is the core of error handling, in my opinion, error chaining and error handling are nearly synonymous.

Error chaining is the distinguishing feature of the delegation performed during error handling.

Error Handling Implementations #

Oops, we were talking about programming languages though, right? How is error handling (and its core feature, delegation informed by error chaining) performed in a program?

I would place most languages into one of the following categories:

No Chaining #

This is when the language either makes no attempt — by convention, standard library, or otherwise — to provide error chaining as a core language feature, or such features are deficient and not adopted. For example, C provides no standard types, functions, or other support for chaining errors. The standard library itself simply sets the global errno or returns an error code by value.

Note that the absence of error chaining in the language certainly does not mean it cannot be done. While it’s generally considered a deficiency in the modern age, it depends on your needs and perspective.

A major consequence, however, is that the ecosystem tends to become fragmented, with various libraries adopting their own solutions. Error chaining across library or ownership boundaries can be excessively difficult or impossible. This often results in context being discarded, or in stringly typed errors where textual descriptions are used in lieu of structured data.

Natural Chaining #

The next approach is what I’ll call natural chaining. This is when the language provides support for chaining using only core language types and features.

Go (since version 1.13, at least) is an example. It provides the error interface and an unusual but universally adopted convention of using tuple assignment syntax (basically, multiple return values) to return both the result and error, if any:

// TryGetValue() returns both a ValueType and error
func TryGetValue() (ValueType, error) {
    ...
}

// usage
value, err := TryGetValue()
if err != nil {
    // handle error
}
// by convention, value is valid if (and only if) err == nil

It also provides standard functions to wrap and unwrap errors such as fmt.Errorf and errors.Unwrap. In this example, TryGetValue() wraps any errors returned by TryGetOtherValue(), which could be unwrapped and evaluated by the caller:

func TryGetValue() (ValueType, error) {
    otherValue, err := TryGetOtherValue()
    if err != nil {
        // if TryGetOtherValue() failed, include that error in our own
        return nil, fmt.Errorf("could not get value: %w", err)
    }
    return otherValue, nil
}

Another example would be Rust. Borrowing from functional languages, functions that can fail return a monadic Result<T,E> type, where the type E represents an error and is expected to conform to std::error::Error, which can include a chained cause.

The distinguishing feature here is that error handling and chaining use the standard control structures of the language, such as ifs (or pattern matching, in Rust’s case) and early returns. There may be special syntax to facilitate things, such as Rust’s ? operator, but these are syntax sugar for the same old branching logic.

This is the source of most complaints with Go. All calls that could fail must be handled with explicit and repetitive if clauses.

Exceptions #

Exceptions attempt to address this issue by introducing new language constructs for error handling. Rather than manually propagating each potential error, you can define the delegation hierarchy for entire components.

This seems to make sense, practically speaking. Only a few locations in a program are able to handle truly exceptional circumstances, and so a common pattern is a main loop such as (using Java-ish syntax for illustration):

while(true) {
    try {
        socket.open();
        var request = readRequest(socket);
        authorize(request);
        var response = handle(request);
        reply(socket, response);
    } catch (Exception ex) {
        log(ex);
        pageOnCall(ex);
    } finally {
        socket.close();
    }
}

The delegation expressed here is simple. Failures in any task - socket.open(), readRequest, authorize, handle, and reply - are delegated to the loop’s catch block. Closing the socket will be performed in all cases.

The value proposition is also straightforward:

There is no explicit error handling in the happy path of the try block.
Assuming a reasonable table-based implementation, there is also zero runtime cost on the happy path.
In most languages, exception chaining is built in and includes the stack trace by default, which is especially valuable if the ultimate delegate ends up being a human being who needs to understand the state of the system.

If this was the end of the story, I don’t think there would be much debate. Exceptions would be universally preferred in any environment where the unpredictable cost of the sad path, increased binary size, and other considerations were not a deal-breaker.

Down The Rabbit Hole (And Up The Stack) With Exceptions #

The benefits of exceptions are tempting, so they tend to be used liberally in languages that support them. This includes the standard library itself in some cases.

However, we can already observe some effects on program design. Any function can throw, and therefore all resource allocations must be enclosed in a try-finally (or equivalent) construct to ensure they are released, as socket.open() is here. If the delegation chain involves multiple layers, they must be expressed in their own, potentially nested try-catch clauses.

Oh, and let’s not forget that we still have that pesky, standard if-then-else-style business logic to worry about! Functions such as authorize and handle might throw their own exceptions (and they probably do!), but they might also return non-exceptional results that require logic. A more realistic version of the above pseudocode might be:

while (true) {
    try {
        socket.open();
        var request = readRequest(socket);
        Response response;
        try {
            authorize(request);
            try {
                response = handle(request);
                if(response == specialCase) {
                    // handle some unusual case
                }
            } catch (ApplicationException ex) {
                // response = ...
            }
        } catch (AuthorizationException ex) {
            // response = ...
        }
        reply(socket, response);
    } catch (Exception ex) {
        log(ex);
        pageOnCall(ex);
    } finally {
        socket.close();
    }
    ...

Now we have business logic embedded in error-handling logic, each with its own control structures and flow. The dependencies expressed here are more complicated than they appear:

socket.close() has no dependencies; it should always be called.
authorize depends on readRequest.
handle depends on authorize.
reply depends on readRequest and conditionally on both authorize and handle.

That last one might be surprising, but look again - an ApplicationException and AuthorizationException are considered user errors and returned to the client. Other exceptions go uncaught and reply is skipped.

This is a toy example, and it’s already no fun to look at. Nor do I think it expresses the above relationships in an obvious or concise way. I think most engineers would agree, because in my experience no one writes modern code like this.

Exceptions in Modern Code #

What’s the solution? Well, how about we introduce yet another control structure?

To keep things manageable, modern codebases often go the path of Rust and introduce functional features. I typically see tasks wrapped in thunks or lambdas, and executions returned as monad-ish containers. How elegant this is depends on the language, but here’s how you might rewrite the above example in Java (using vavr):

Try.withResources(socket::open)
        .of(sock -> Try.of(() -> readRequest(sock))
                .andThen(this::authorize)
                .map(this::handle)
                .recover(ex -> Match(ex).of(
                        Case($(instanceOf(AuthorizationException.class)),
                                t -> new Response(t.getMessage())),
                        Case($(instanceOf(ApplicationException.class)),
                                t -> new Response(t.getMessage()))
                ))
                .andThen(response -> reply(sock, response))
                .onFailure(ex -> {
                    log(ex);
                    pageOnCall(ex);
                }));

Details of this particular library aside, this is considerably easier to follow once you get over any initial hurdles with the functional syntax. Task dependencies are now expressed as a pipeline of operations. Each step in the pipeline returns a monadic type Try<T> containing either a success of type T, or a failure and associated exception. Operations such as andThen and map are no-ops for the failure case, which allows dependent tasks to be skipped.

Where Does This Leave Us? #

So with all that context out of the way, here are the conclusions I’ve come to regarding error handling:

1. The proposed benefits of exceptions are not compelling.

I don’t find the brevity or theoretical zero cost of happy-path execution when using exceptions to be a compelling argument for them. They are overused in the languages supporting them, and modern codebases usually throw away runtime performance in favor of functional-style logic anyway. Exceptions are usually disabled altogether in the most performance-sensitive environments.

2. Error chaining is important, and stack traces via exceptions are the easiest solution presently available.

That being said, I understand why exceptions are still defended by those who dislike the explicit style. I have yet to see a solution for error propagation that approaches the ease of exceptions. They provide automatic chaining that includes highly valuable context (stack traces) across library and ownership boundaries, without effort by either the user or library author.

Does this justify using exceptions in your code? No, but it should at least help you sleep at night if you’re stuck with them. Especially in large web backends, a significant portion of the application’s functionality may involve calling 3rd-party client libraries. An error handler that ships uncaught exceptions to Sentry or some such is invaluable for debugging these types of issues.

3. Whenever possible, return monadic Result types.

Absent other constraints, in any language that supports such a thing, any function that can fail should return the equivalent of a monadic Result rather than throw an exception. Similarly, existing or 3rd-party code that throws exceptions should be wrapped as described above.

Go is a unique exception. Its error handling conventions predate support for generics in the language, but are otherwise sufficient. It’s entirely possible (though generally less elegant) to get equivalent behavior. I believe this is what Rob Pike was trying to express in his article Errors Are Values. He implements an errWriter object with a write method that no-ops if the object is in a failure state:

type errWriter struct {
    w   io.Writer
    err error
}

func (ew *errWriter) write(buf []byte) {
    if ew.err != nil {
        return
    }
    _, ew.err = ew.w.Write(buf)
}

The article never mentions monad or hints at a more general pattern. Presumably, he was just presenting a solution in the context of the tooling available in Go, which (at the time anyway) lacked generics.

4. There is still no magic bullet when it comes to error handling.

Sometimes you have to just roll up your sleeves, let out a sigh, and return an int, C-style. At least it will fit in a register.

For the enthusiasts, I’m only concerned with recoverable errors. OutOfMemoryException, a fire in the datacenter, and cosmic rays probably wouldn’t fit this definition. ↩︎