4 comments

  • arjie 27 minutes ago ago

    Not “local” and not interactive coding but sharing since it might be helpful. I have 2x RTX Pro 6000 Blackwell running DeepSeek V4 Flash. I get 160 tok/s raw but it’s a reasoning model. For my use case, I have it auto-write code and another system auto-review the code.

    I occasionally use it with pi to write some code and it’s blazing fast but it’s mostly habit that keeps me with CC and Codex.

  • kertoip_1 5 minutes ago ago

    Just attach OpenRouter to your coding agent tool and try yourself. All relevant open weight models are there. Every person have different needs and expectations

  • HappySweeney 16 minutes ago ago

    I have an optane and lots of ram, so I tried full-fat models for writing some function overnight, as I get about 0.7 t/s. My current go-to test is to update a scalar function to transpose a bit-matrix to one using avx512. the cloud models all play with that like its nothing. Kimi 2.6 and GLM 5.1 both failed miserably.

  • tumetab1 27 minutes ago ago

    Not yet, tried Gemma 4 on an Apple M4 but the tok/s is significant lower than the cloud offering.

    Also,the lack of enterprise tooling to help selected an appropriate model and tooling to run a local LLM does not help.