Toward Guarantees for Clinical Reasoning in Vision Language Models

(arxiv.org)

4 points | by barthelomew 11 hours ago ago

5 comments

barthelomew 11 hours ago ago

AI (VLM-based) radiology models can sound confident and still be wrong ; hallucinating diagnoses that their own findings don't support. This is a silent, and dangerous failure mode.
Our new paper introduces a verification layer that checks every diagnostic claim an AI makes before it reaches a clinician. When our system says a diagnosis is supported, it's been mathematically proven - not just guessed. Every model we tested improved significantly after verification, with our best result hitting 99% soundness.
We're excited about what comes next in building verifiably correct AI systems.

[-]
- phoenixrecruit 10 hours ago ago
  
  nice work!I know about your work, similar to this: https://arxiv.org/abs/2601.20055 and https://github.com/DebarghaG/proofofthought
  
  [-]
  - barthelomew 10 hours ago ago
    
    Yes, indeed! This work uses the Proof of Thought library and several techniques from VERGE!
undefined 10 hours ago ago

[deleted]
zenon_paradox 10 hours ago ago

[dead]