Very nice. But "Models excel at code, but not at visual inspection" is limited, Claude excel's at code cause that is anthropics main focus. Google will leapfrog them soon.
I could never see Claude doing this without a human in the loop, while Google has probably already reverse engineered a good chunk of the software available on the web
When dealing with binaries, Gemini 3.1 Pro is in the same tier as Opus 4.6, https://quesma.com/benchmarks/binaryaudit/. Here are the results without humans in the loop, fully end-to-end.
For any practical development, you want humans in the loop, just precisely as much as it is needed (e.g. to ask the right questions, not to get steered away), but not more.
The source code to Chromatron was made available on the game's forum at one point. Here's a copy I saved:
https://gist.github.com/duskwuff/513e2b4f38b3db2e060c8611ebf...
A couple of helper functions are missing, but nothing terribly important.
I wasn't aware it existed - and thanks for saving!
And wow, a really clear and nice code.
Similar recent project for Skyroads: https://github.com/ammaarreshi/SkyRoads-Codex
Excellent article, thanks!
Very nice. But "Models excel at code, but not at visual inspection" is limited, Claude excel's at code cause that is anthropics main focus. Google will leapfrog them soon.
I could never see Claude doing this without a human in the loop, while Google has probably already reverse engineered a good chunk of the software available on the web
When dealing with binaries, Gemini 3.1 Pro is in the same tier as Opus 4.6, https://quesma.com/benchmarks/binaryaudit/. Here are the results without humans in the loop, fully end-to-end.
For any practical development, you want humans in the loop, just precisely as much as it is needed (e.g. to ask the right questions, not to get steered away), but not more.