Parametric CAD Bench

(cadbench.ai)

10 points | by handcrafted 12 hours ago ago

5 comments

  • little_cad 8 hours ago ago

    I only see closed-source models on your leaderboard so far: https://cadbench.ai/leaderboard

    It would be interesting to see how open-source models perform on CAD tasks.

  • mjzh 10 hours ago ago

    interesting, per https://cadbench.ai/leaderboard, gpt5.5 is the best, not the opus 4.7, why opus 4.7 is with mini-swe-agent, not claude code.

    • handcrafted 9 hours ago ago

      GPT-5.5 and Opus 4.7 are comparable when using the same harness mini-swe-agent. GPT-5.5 demonstrates a significant performance delta only when integrated with the Codex module. We hypothesize that the superior performance of Opus 4.7 on mini-swe-agent relative to the more complex Claude Code harness stems from the tight feedback loop (edit-run-check), well suited for the CAD generation task.

  • gnucleus_peggy 12 hours ago ago

    [dead]