Ask HN: If I cancel Codex today whats the next best local inference agent?

8 points | by Bulbasaur2015 14 hours ago ago

4 comments

kevinsmith51 2 hours ago ago

Would use a capability based routers so you can use a blend of OSS models. I.e. use the least capable model per prompt that includes the appropriate tooling capability, etc. Can even include a frontier provider subscription and get almost as many tokens at very close to the benchmarking on a $20/mo subscription as a $200/mo subscription. Easier with Claude's bearer token setup but I have seen people do it with OpenAI subscriptions as well.
bigyabai 14 hours ago ago

For local inference? It entirely depends on what your hardware is.
JojoFatsani 7 hours ago ago

Check llmfit
verdverm 14 hours ago ago

OpenCode + vllm, model will depend on your hardware, but OpenCode also has a killer $10/m plan with quotas for some top tier open weight models.
I'm using qwen3.6 on a DGX spark, llama-cpp has prompt cache bugs for qwen/gemma models (among more being reported). Using my OpenCode-go sub when I want a bigger / more capable model