Two kinds of error

(evanhahn.com)

40 points | by zdw 2 days ago ago

21 comments

  • pjdesno 11 hours ago ago

    If you're writing code professionally, then you're not in college anymore and your programs aren't simple things that run from the start of main() through to the end and then exit.

    If you're providing a service that needs to keep running, you need a strategy for handling unexpected errors. It can be as simple as "fail the request" or "reboot the system", or more complicated. But you need to consider system requirements and the recovery strategy for meeting them when you're writing your code.

    Long, long ago I worked with some engineers who thought it was just fine that our big piece of (prototype) telecom equipment took half an hour to boot because of poor choices on their part. Target availability for the device was 5 9s, which is 5 minutes of downtime per year. They didn't seem to realize the contradiction.

    • keybored 10 hours ago ago

      > If you're writing code professionally, then you're not in college anymore and your programs aren't simple things that run from the start of main() through to the end and then exit.

      Or you’re developing a CLI-biased app.

      Plenty of "die" over yonder.

  • ciroduran 10 hours ago ago

    I don't think that this simplification is useful. In the context of a user, crashing a program is a very disruptive process. Crash enough times, and a retail user might just ask for a refund, or move a corporate process to get rid of the program. Some errors won't happen with the same repro case, e.g. a memory leak, handling that is hard.

  • scuff3d 4 hours ago ago

    "Expected errors should not throw, raise, or panic. Instead, they should return an error result."

    Part of the problem is devs often can't tell the difference between an error and a negative result. For example, I worked in a code base once that threw errors when a database query came back empty. That's not an error, that's a result! Errors should be _exceptional_ cases, like the connection to the database dropped, or the user provided bad input making the query impossible.

    Errors as happy path control flow also drive me nuts.

  • mjw1007 12 hours ago ago

    It's common that if you extract a function or a module with a well defined interface, that function or module can't tell whether a "bad" input indicates an expected or an unexpected error.

    For example division by zero often indicates an "unexpected" error, but it wouldn't if you were implementing a spreadsheet.

    So to me the approach of using different forms of error reporting for the two kinds of error doesn't seem promising: if you imagine you had to implement division yourself, which kind of error should it report? Should you have two variants of every fallible function so the caller can choose?

    • allreduce 11 hours ago ago

      That's a deficit of most programming languages. One solution is to pass every error value up and let the caller decide. Rust does this to some extent. This leads to verbose code however.

      For modules inside your application, designing a good interface involves exposing the right errors and crashing for the rest. This creates some coupling of course (shared assumptions of which errors need to be handled across modules). Trying to avoid that probably just leads into the circle of hell where you have more abstract beancounting than useful code.

      In the end, this is another reason why overreliance on external libraries leads to mediocre, buggy software.

  • bonoboTP 10 hours ago ago

    This is the same distinction and motivation that led to Java's checked exceptions vs RuntimeException. It seemed like a good idea to enforce at the language level to handle what the author calls "known" errors, but treat "unknown" ones more leniently.

    It led to a lot of boilerplate and as far as I know with hindsight it's seen as a bad design choice because the line is not so clear to draw.

    As a first-order approximation it's good to learn about though, if you haven't heard about the concept. But I'd say this idea is introduced usually in the first programming course in an undergraduate college program.

  • noelwelsh 14 hours ago ago

    I agree with this, and I'd add there are two modes of processing errors: fail-fast (stop on first error) and fail-last (do as much processing as possible, collecting all errors). The later is what you want to do when, for example, validating a form: validate every field and return all the errors to the user.

  • taylorallred 13 hours ago ago

    It makes me think that it's worth sitting down and considering what all the valid outcomes for a piece of functionality are. A user typing in a string in the wrong format is not necessarily "exceptional", whereas running out of memory while getting the input would be. I feel like programmers too often treat perfectly valid outcomes to be errors. For example, in Rust I'll see Option<Vec<Foo>> and I ask myself if we could just use the empty vector as a valid return value.

  • tl2do 12 hours ago ago

    The expected/unexpected distinction assumes you have the capacity to anticipate failures in the first place. But that capacity varies - even among experienced developers, everyone has blind spots shaped by their specific history. What's "obviously expected" to one senior dev is a surprise to another. The article's model is useful, but there's a prerequisite it doesn't address: the ability to expect is itself unevenly distributed.

  • joshdick 12 hours ago ago

    This is 4XX versus 5XX errors in HTTP.

  • lexx 13 hours ago ago

    Nice article. I agree with the differentiation. You could also classify them as errors that should be fixed and errors that should exist. Some would argue that validation errors are not really errors on a system level. They are only errors on the user level. On the system level they are a feature

  • tantalor 10 hours ago ago

    > Expected: user enters invalid data

    > Unexpected: function must be called with a non-empty string, and someone didn't

    These seem like the same thing, I don't get why they are treated differently.

    • MadnessASAP 9 hours ago ago

      They are different from both the users and developers perspective. A developer can't prevent a user from entering invalid data, they can however check for invalid data and inform the user as such.

      A user can't prevent a programmer from calling a function with a non-empty string, a developer however can ensure the arguments are valid before calling a function.

      In both cases there is something the developer can do. Only in the 1st case is their something the user can do, but only if the program informs them of the problem.

      BUT! And this but is something that really gets up my butt. If a programmer spits out a thousand non-fatal errors that I can't do anything about, I'm likely to miss the one error I can do something about. So that's also an important distinction between these 2 cases.

  • supermdguy 13 hours ago ago

    I'd also add errors with third-party systems, which aren't the developer's or the user's fault, but which are probably worth handling nicely (e.g. retry with backoff).

  • thelittlenag 12 hours ago ago

    I don't really like this article. There isn't anything particularly noteworthy to noticing that some computations have outcomes that allow some form of recovery, and other outcomes do not.

    But there are some obvious follow up questions that I do think need better answers:

    Why is recovery made so hard in so many languages?

    Error recovery really feels like an afterthought. Sometimes that's acceptable, what with "scripting" languages, but the poor ergonomics and design of recovery systems is just a baffling omission. We deserve better options for this type of control flow.

    Also, why do so many languages make it so hard to enumerate the possible outcomes of a computation?

    Java tried to ensure every method would have in its signature how it could either succeed or fail. That went so poorly we simply put everything under RuntimeException and gave up. Yet resilient production grade software still needs to know how things can fail, and which failures indicate a recoverable situation vs a process crash+restart.

    Languages seem to want to treat all failures as categorically similar, yet they clearly are not. Recovery/retry, logging, and accumulation all appear in the code paths production code needs to express when errors occur.

    Following programming language development the only major advancements I've noticed myself have been the push to put more of the outcomes into the values of a computation and then further use a type system to constrain those values. That has helped with the enumeration aspect, leaving exceptions to mainly just crash a system.

    The other advancement has been in Algebraic Effects. I feel like this is the first real advancement I've observed. Yet this feature is decried as too academic and/or complex. Yes, error handling is complex and writing crappy software is easy.

    Maybe AI will help us get past the crabs at the bottom of the bucket called error handling.

  • metalliqaz 13 hours ago ago

    This post seems to conflate using `throw`, `raise`, etc. with crashing. The idea that 'handling' an error does not involve `throw`/`catch`, `try`/`except` is very strange to me. The exception facility is often the most elegant way to check inputs, and if I remember correctly the Python documentation says as much as well.

  • taylorallred 13 hours ago ago

    Another interesting article on error handling: https://www.dgtlgrove.com/p/the-easiest-way-to-handle-errors

  • IshKebab 12 hours ago ago

    This sounds very neat in theory, but in practice errors are a continuum between these two extremes and there isn't really a clean dividing line.

  • undefined 13 hours ago ago
    [deleted]
  • smitty1e 12 hours ago ago

    I was expecting Type I & II from statistical theory:

    "Type I error, or a false positive, is the incorrect rejection of a true null hypothesis in statistical hypothesis testing. A type II error, or a false negative, is the incorrect failure to reject a false null hypothesis"

    https://en.wikipedia.org/wiki/Type_I_and_type_II_errors