SWE-bench will hit 90% this year

(fabraix.com)

6 points | by asfsf23423 16 hours ago ago

5 comments

upmind 14 hours ago ago

Maybe unpopular opinion but I think at this point SWE-Bench has done its part and we need a new benchmark because Gemini being on/near the same level as Claude is obviously wrong

[-]
- undefined 5 hours ago ago
  
  [deleted]
- amazingamazing 13 hours ago ago
  
  I use both and think they’re comparable. AMA.
  
  [-]
  - zachdotai 5 hours ago ago
    
    Not sure which version of Gemini are you using but Claude is so much better for me. Gemini is generally overeager to make a code change even when I am just asking conceptual questions, among other issues.
- lern_too_spel 13 hours ago ago
  
  Gemini at the same level as Claude is believable. Gemini CLI is not at the same level as Claude Code.