Can LLMs reason about math? The Subtraction Trick Test

(haversine.substack.com)

5 points | by MakeAJiraTicket 10 hours ago ago

5 comments

  • mapontosevenths 10 hours ago ago

    This is really clever, but I think Gemini got it on the first try. I kept my prompts close to yours, but didn't include the initial framing bit about how it was supposed to be an expert.

    https://gemini.google.com/share/b66e0158ee29

    • MakeAJiraTicket 9 hours ago ago

      Thank you! Gemini has consistently been the best performer that I've tried, but they always require the connection to be made explicit. The point of the test is that it is very low complexity and is very targeted toward what can be considered reasoning and these models can't produce the connection without prodding.

      In the ideal case of reasoning you would simply present the methods and they'd bridge the gap independently when it is brought to the forefront of their context together, but it doesn't happen.

      • mapontosevenths 9 hours ago ago

        ChatGPT got it with less prodding, but I had to set it to "Pro" thinking mode (ChatGPT's version of Deep Think, I suspect). I'm sure Deep Think could get it with even less prompting.

        I think your conclusion that they aren't really thinking doesn't hold. They're already there, it just costs more and time to get good results.

        https://chatgpt.com/share/69a12666-64b0-8009-8dfe-59546ac400...

        EDIT - Updated the link to include the full conversation. Note that I didn't change it to pro mode until the end, and eventually got tired of waiting and just told it "answer now."

  • undefined 10 hours ago ago
    [deleted]
  • undefined 10 hours ago ago
    [deleted]