Can frontier LLMs solve CAD tasks?

(kerrickstaley.com)

12 points | by KerrickStaley 2 days ago ago

5 comments

  • __atx__ a day ago ago

    Pretty interesting that simulator-only binary feedback (unless I am reading it wrong) was enough here to build some pretty robust models!

    I maintain [1], which provides the models with the ability to render a screenshot from any angle and as far as I can tell, visually driven feedback does not work that well as this point. The models probably don't get enough of "lovecraftian garbled 3D model mess" in the training data or something...

    [1] https://atx.github.io/OpenSCAD-Bench/

    • KerrickStaley a day ago ago

      Cool project, thanks for sharing!

      The simulator lets the LLM request renders from different angles/times, so the LLM can get visual feedback. For failures, the simulator also returns status codes like `object_fell` or `mount_initially_collided_with_object` depending on what happened. You can see what the tool call looks like by looking at the Transcript tab, e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__...

      I agree it's not clear how much benefit models get from iteration. Many of the successful runs are one-shots. You can see some examples of basic spatial reasoning e.g. here https://kerrickstaley.com/ai-cad-design-mount-viz/gso__mug__... :

      > The initial collision is because the mount was positioned at the same height as the mug's body center (z=-22), causing overlap. I need to lower the mount significantly so the mug starts above it and drops into the cradle.

      • __atx__ a day ago ago

        > I'll also remove the end cap to avoid it blocking the mug's descent.

        Ah yes, that matches my observations. It kinda sees that the stuff it is looking for is there, but does not see enough detail to actually notice that not only there is an endcap in the way, but the mug is also rotated the wrong way to sit in the holder.

        It feels like the "r's in strawberry" effect where the models do not have enough introspection into the raw input data.

    • 8note a day ago ago

      its binary only in the success case. looks like the failures have details returned

  • undefined a day ago ago
    [deleted]