The sigmoids won't save you

(astralcodexten.com)

34 points | by Tomte 6 hours ago ago

22 comments

  • gm678 31 minutes ago ago

    I don't know what the Y-axis is supposed to be on that Wharton AI capabilities graph, but I am not really convinced that Opus 4.6 has more than double the intelligence/capability/whatever of GPT 5.1 Max.

    • NitpickLawyer 28 minutes ago ago

      IIRC that graph tracks capabilities as time_to_solve a task for humans (i.e. the model can now handle tasks that usually take a human ~8h). Which, depending on what tasks you look at, could be a reasonable finding. I could see Opus 4.6 handling tasks that take ~8h for humans, and that 5.1 couldn't previously handle (with 5.1 being "limited" at 4h tasks let's say). It is a bit arbitrary, but I think this is what they're tracking.

      • lukan a minute ago ago

        "It is a bit arbitrary, but I think this is what they're tracking."

        I don't know if they can get their numbers right this way, but this seems a way more useful metric, than theoretic capabilities.

    • BoredPositron 29 minutes ago ago

      https://metr.org/time-horizons/ on linear scale. Clickbait garbage article as most of his in the last year.

      • afthonos 23 minutes ago ago

        …yeah, that’s where you see the exponential?

  • philipallstar 36 minutes ago ago

    But they do explain the improvement of AI driving 2017-2021 vs 2022-2026.

  • andai 31 minutes ago ago

    Well, curve shape aside, the high watermark might be lower than where it tapers off.

    https://news.ycombinator.com/item?id=46199723

  • krupan 11 minutes ago ago

    News flash: predicting the future is hard

  • inglor_cz 28 minutes ago ago

    Hmmm, this is quite an interesting take by Scott.

    Lindy's Law is not actually a law and many exact minds will be provoked by the very name; it also fails spectacularly in certain contexts (e.g. lifetime of a single organism, though not necessarily existence of entire species).

    But at the same time, I am willing to take its invocation in the context of AI somewhat seriously. There is an international arms race with China, which has less compute, but more engineers and scientists. This sort of intellectual arms race does not exhaust itself easily.

    A similar space race in the 1950s and 1960s progressed from first unmanned spaceflight to a moonwalk in mere 12 years, which is probably less than what it takes to approve a bicycle lane in Chicago now.

    • krupan 11 minutes ago ago

      "There is an international arms race with China"

      I keep seeing this. Where did it come from? Has China said that they intend to attack other countries using AI? Have other countries declared that they intend to attack China with AI?

      Also, why does anyone believe that AI could actually be that dangerous, given it's inherent unpredictable and unreliable performance? I would be terrified to rely on AI in a life or death situation.

      • aspenmartin a minute ago ago

        AI in war is like Palintirs whole business model. You have a system that can effectively deal with ambiguity and has superhuman performance on reasoning plus superhuman physical abilities via embodiment…

        Inherent unpredictable and unreliable performance is also quite the feature of human beings as well.

      • dmbche 6 minutes ago ago
      • inglor_cz 8 minutes ago ago

        It was a metaphor. I meant, and later clarified, an intellectual arms race.

        BTW your handle is an actual Czech word, minus a diacritic sign ("křupan"), and a bit amusing one. It basically means hillbilly. Not that it matters, just FYI.

  • addaon 29 minutes ago ago
  • devmor 31 minutes ago ago

    "Exponentials all tend to become sigmoids but you can't predict exactly when" is a true statement, but I'm not sure it needed an article.

    This doesn't say much, and the author fights their own points a couple times, suggesting that they maybe didn't think through what they wanted to write until they were in the middle of writing it and started realizing their assumptions didn't match what they expected the data to say.

    I really don't get the point of what I just read.

  • BoredPositron 31 minutes ago ago

    If you use the log scale you'll see that the time horizon of opus 4.6 was as expected...

    • afthonos 21 minutes ago ago

      As expected by the exponential. The Wharton study was predicting when the exponential would turn into a sigmoid.

    • ReptileMan a minute ago ago

      Everything is linear on a log log scale with a fat marker.

  • nathan_compton 32 minutes ago ago

    A lot of words to say "The initial part of a sigmoidal curve is not very informative about the parameters of the sigmoid function in question."

    • inglor_cz 24 minutes ago ago

      That is true, but I generally enjoy reading a lot of words from Scott, who has a talent for writing.

      The entire plot of the Lord of the Rings could probably be compressed into less than 10 kB of text too.

      Edit: this seems to be a controversial comment, but IMHO a blog of Scott Alexander's type is an art form, not just a communication channel.