MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper

(swebench.com)

5 points | by thelinuxkid a day ago ago

8 comments

createaccount99 a day ago ago

Nobody in the know is paying for Claude anymore, it's so overpriced. It's just the enterprise users.

[-]
- thelinuxkid a day ago ago
  
  Well the API is certainly overpriced, no one will pay for that, and that's what these benchmarks use...
  But the subscription is still under 20x
thelinuxkid a day ago ago

MiniMax M2.5 is beating Claude Opus 4.6 and MiniMax is 17x-20x cheaper….why isn’t anyone talking about this?

[-]
- necovek a day ago ago
  
  Sounds like one peculiar benchmark that a particular model could be trained on.
  Btw, how do you get 17-20x? Cost seems to be 0.07 to 0.55 (for 4.6, or 8x) and 0.75 (for 4.5 or 11x).
  
  [-]
  - thelinuxkid a day ago ago
    
    The 17-20x is output token pricing: MiniMax M2.5 Standard at $1.20/M output vs Opus 4.6 at $25/M = ~21x. On input it's $0.30 vs $5.00 = ~17x. Blended (3:1 input:output) works out to roughly 19x. Curious where you're getting 0.07 and 0.55 — are you looking at M2.5 Lightning pricing or a different provider?
    
    [-]
    - necovek 11 hours ago ago
      
      I look at "Avg $" (average task cost per instance) from the page you linked to.
- acro-v a day ago ago
  
  Because it's not true? The beating portion, not the cheaper portion.
  
  [-]
  - thelinuxkid a day ago ago
    
    Why is it not true?
    SWE bench is the standard bench to measure an LLMs coding capabilities