Ask HN: What explains the recent surge in LLM coding capabilities?

6 points | by orange_puff 13 hours ago ago

2 comments

softwaredoug 9 minutes ago ago

Codex/Claude gather telemetry by default. That’s why they are subsidized. You’re giving them training data.
If you start with everything on GitHub, with maybe some manual annotated prompts for fine tuning, you get a decent base model of “if you see this code, then this other code follows” you’ll only go so far
If you can track how thousands of people actually use prompts, then the most successful tool usage patterns that result in success, then you will be able to fine tune to even more data (and train to avoid the unsuccessful ones). Now you’re training with much more data, around how people actually use the product, not theoretical scenarios.
In ML it always boils down to the training data.
coder4rover 12 hours ago ago

Quantum computing such that permutations of code to prompt is possible as it tries to answer to some kind of statistical probability solution.