Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?

22 points | by cloudking 2 hours ago ago

4 comments

arjie 27 minutes ago ago

Not “local” and not interactive coding but sharing since it might be helpful. I have 2x RTX Pro 6000 Blackwell running DeepSeek V4 Flash. I get 160 tok/s raw but it’s a reasoning model. For my use case, I have it auto-write code and another system auto-review the code.
I occasionally use it with pi to write some code and it’s blazing fast but it’s mostly habit that keeps me with CC and Codex.
kertoip_1 5 minutes ago ago

Just attach OpenRouter to your coding agent tool and try yourself. All relevant open weight models are there. Every person have different needs and expectations
HappySweeney 16 minutes ago ago

I have an optane and lots of ram, so I tried full-fat models for writing some function overnight, as I get about 0.7 t/s. My current go-to test is to update a scalar function to transpose a bit-matrix to one using avx512. the cloud models all play with that like its nothing. Kimi 2.6 and GLM 5.1 both failed miserably.
tumetab1 27 minutes ago ago

Not yet, tried Gemma 4 on an Apple M4 but the tok/s is significant lower than the cloud offering.
Also,the lack of enterprise tooling to help selected an appropriate model and tooling to run a local LLM does not help.