One thing we found interesting is that Clojure's benchmark-wide performance was nuked by Anthropic models, whereas OpenAI and many other frontier lab releases reasoned quite well in Clojure.
The individual language per-model data is noisy, but aggregated per-provider data gives an interesting view of how companies are training their models. Data at https://gertlabs.com/rankings?mode=agentic_coding
One thing we found interesting is that Clojure's benchmark-wide performance was nuked by Anthropic models, whereas OpenAI and many other frontier lab releases reasoned quite well in Clojure.
The individual language per-model data is noisy, but aggregated per-provider data gives an interesting view of how companies are training their models. Data at https://gertlabs.com/rankings?mode=agentic_coding