With Nvidia's GB10 Superchip, I'm Running Serious AI Models in My Living Room

(pcmag.com)

12 points | by the_arun 12 hours ago ago

9 comments

androiddrew 10 hours ago ago

The initial benchmarks I saw didn’t really have the DGX spark (GB10) doing much better in generation throughput than the AMD Strix Halo. Prefill, the GB10 does pretty well, much better than the Strix.
Memory bandwidth is 273 Gb/s. Which is nowhere near a GPU’s. It’s a 4K machine. Personally, I’d rather have two GPUs and run a quantize model. I have two 32GB AMD r9700 cards, cost $2600. Quantized models get me 120K ish of context window and TPS is about 60% of what I see with the same model on my 4090 (which only has enough vram to load weights and about 6K context).
Sure I can’t run a 100B+ model but neither can a single GB10 unless no context window is what you are going for. So you buy a second 4K machine?

[-]
- EnPissant 10 hours ago ago
  
  Strix Halo is pretty useless for inference because the prefill is too slow.
  At least this thing is actually useful, and there are $3k variants available.
  
  [-]
  - Zetaphor 9 hours ago ago
    
    I keep reading comments saying it's useless from people who clearly haven't actually used it.
    I'm building and using this machine daily for building and using applications with LLMs, TTS, STT, ASR, and image generation.
    
    [-]
    - EnPissant 7 hours ago ago
      
      Which, GB10 or Strix Halo?
pyuser583 11 hours ago ago

Is it 128 gb ram or vram?

[-]
- wmf 11 hours ago ago
  
  It's unified memory so up to ~120 GB can be used as VRAM.
- undefined 11 hours ago ago
  
  [deleted]
BoredPositron 7 hours ago ago

Should have used one of it for the headline.
undefined 10 hours ago ago

[deleted]