Ask HN: Do we need a language designed specifically for AI code generation?

3 points | by baijum 12 hours ago ago

5 comments

dtagames 9 hours ago ago

LLMs don't work the way you think. In order to be useful, a model would have to be trained on large quantities of code written in your new language, which don't exist.
Even after that, it will exhibit all the same problems as existing models and other languages. The unreliability of LLMs comes from the way they make predictions, rather than "retrieve" real answers, like a database would. Changing the content and context (your new language) won't change that.

[-]
- baijum 5 hours ago ago
  
  That's a very fair and critical point. You're right that we can't change the fundamental, probabilistic nature of LLMs themselves.
  But that makes me wonder if the goal should be reframed. Instead of trying to eliminate errors, what if we could change their nature?
  The interesting hypothesis to explore, then, is whether a language's grammar can be designed to make an LLM's probabilistic errors fail loudly as obvious syntactic errors, rather than failing silently as subtle, hard-to-spot semantic bugs.
  For instance, if a language demands extreme explicitness and has no default behaviors, an LLM's failure to generate the required explicit token becomes a simple compile-time error, not a runtime surprise.
  So while we can't "fix" the LLM's core, maybe we can design a grammar that acts as a much safer "harness" for its output.
  
  [-]
  - dtagames 2 hours ago ago
    
    I would say we have this language already, too. It's machine code or its cousin, assembler. Processor instructions (machine code) that all software reduces down to are very explicit and have no default values.
    The problem is that people don't like writing assembler, which is how we got Fortran in the first place.
    The fundamental issue, then, is with the human language side of things, not the programming language side. The LLM is useful because it understands regular English, like "What is the difference between 'let' and 'const' in JS?," which is not something that can be expressed in a programming language.
    To get the useful feature we want, natural language understanding, we have to accept the unreliable and predictive nature of the entire technique.
theGeatZhopa 7 hours ago ago

What's needed is a formalization and that formalization to been trained on. In not sure if systemprompt alone is powerful enough to check and enforce input as definite and exact formalized expression(s).
I don't think it will work out easily like "a programming language for LLM" - but you can always have a discussion with ol' lama
muzani 9 hours ago ago

Generally they work better with words that are more easily readable by humans. They have a lot of trouble with JSON and do YAML much better, for example. Running through more tokens doesn't just increase cost, it lowers quality.
So they'd likely go the other way. It's like how spoken languages have more redundancies built in.