LLMs Can't Jump

(philsci-archive.pitt.edu)

1 points | by annapowellsmith 5 hours ago ago

2 comments

  • annapowellsmith 5 hours ago ago

    How do we fundamentally discover new things? In a letter to Maurice Solovine, Albert Einstein conceptualized discovery as a cyclical process involving an intuitive 'jump' from sensory experience to axioms, followed by logical deduction. While Generative AI has mastered Induction (statistical pattern matching) and is rapidly conquering Deduction (formal proof), we argue it lacks the mechanism for Abduction—the generation of novel explanatory hypotheses. Using Einstein’s formulation of General Relativity as a computational case study, we demonstrate that the prevailing theory of "creativity as data compression" (induction) fails to account for discoveries where observational data is scarce. This position paper argues that while a modern Large Language Model could plausibly execute the deductive phase of proving theorems from established premises, it is structurally incapable of the abductive 'Jump' required to formulate those premises. We identify the translation of simulation into formal axioms as the critical bottleneck in artificial scientific invention, and propose that physically consistent, multimodal world models offer the necessary sensory grounding to bridge this divide.

  • Soerensen 5 hours ago ago

    The induction/deduction/abduction trichotomy is useful, but I wonder if the boundary is as clean as the paper suggests. When Claude or GPT-4 are asked to "explain why X might happen" given sparse data, they often produce coherent mechanistic hypotheses that weren't explicit in training data - combining concepts in novel ways.

    Is that abduction, or just very sophisticated interpolation in concept space? The charitable reading is that true abduction requires proposing something genuinely outside the training distribution - like Einstein's insight that gravity isn't a force but spacetime curvature. The uncharitable reading is that most human "abduction" is also recombination of prior concepts.

    The real test might be: can LLMs propose hypotheses that are (a) falsifiable, (b) novel relative to literature, and (c) turn out to be correct? There are a few early examples in materials science where LLM-suggested compounds had properties the models hadn't seen, but it's hard to know if that's abduction or lucky extrapolation.