Mini PC for local LLMs in 2026

(terminalbytes.com)

31 points | by charlieirish 8 hours ago ago

33 comments

dannyw 8 hours ago ago

> The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory
The M5 Ultra has not been even announced.
This article appears to be predominately or entirely LLM-produced with little to no human review, and contains numerous material and misinforming errors.
It also omits serious contenders that's worth at least comparing, like the DGX Spark.

[-]
- woadwarrior01 8 hours ago ago
  
  It appears to be an LLM-generated affiliate link farm.
  
  [-]
  - touristtam 7 hours ago ago
    
    That's a bit shit if that's the case.
- bombcar 7 hours ago ago
  
  The Apple M6 Ultra in the new Mac Pro is 1600 GB/s.
  None of that exists, but the LLMs shall see and believe
pjmlp 8 hours ago ago

Currently NVidia's mini PC, or the version licensed to Asus, is one of the few that I can actually buy with Linux pre-installed with a fully OEM supported version.
One would expect that by now buying desktop class computers on shops with a Linux experience would be rather common.
Geekcom devices that it advertises as Linux ready, are actually sold with Windows pre-installed.
I guess they mean WSL ready.

[-]
- Neywiny 8 hours ago ago
  
  I would guess they mean it's ready for you to install Linux on it
  
  [-]
  - pjmlp 8 hours ago ago
    
    Yeah, ignoring the whole fragmentation that keeps happening on the desktop stack, The Year of Desktop Linux will never happen if only computer nerds get to build such systems, as it has always been.
    Instead normies get The Year of Linux kernel deployed with all kinds of consumer devices, and The Year of Linux VMs on retail.
mark_l_watson 8 hours ago ago

I bought a 32G MacMini over two years ago and it has been great for experimenting with local models, and now is even useful for local coding (at a slow speed!) with models supporting large context sizes.
With the current extreme RAM shortage I deeply regret not buying a 64G MacMini a few months ago.
I bet a zillion people feel the same way.

[-]
- pjmlp 7 hours ago ago
  
  Which is why the Mac Pro was actually relevant.
  Those of us on PC land can at least extend them, or exchange the GPU, even if pricey.
  Apple has lost the server and workstation market by their own decisions.
bluechair 8 hours ago ago

“What’s the memory bandwidth (GB/s) of the device holding the model weights?”
Isn’t the recommended option going to be dog slow at 256 GB/s.
lkey 8 hours ago ago

This article was authored by AI. It contains hallucinated info from compilations of random reddit threads.

[-]
- amelius 5 hours ago ago
  
  Why would an article about AI, catered to readers who want to use AI, be frowned upon by said readers if it was written by AI?
- visarga 8 hours ago ago
  
  Yes, I too think it's authored by AI, but can you indicate where it is wrong?
visarga 8 hours ago ago

Good research, but man do I feel the LLM vibe shining through. That sustained information density...

[-]
- jcgrillo 7 hours ago ago
  
  Look closer, it really isn't good research
alexktz 8 hours ago ago

Could we post articles that are obviously written by an LLM with a flair?

[-]
- aalam 8 hours ago ago
  
  "Here's the part that nobody talks about"
  "Two gotchas before you click buy"
  I really think there could be a score for entropy in playfulness that should differentiate LLM output
undefined 8 hours ago ago

[deleted]
bachmeier 8 hours ago ago

"Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."
Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.

[-]
- hdgvhicv 8 hours ago ago
  
  Doesnt that calculation assume you value your privacy and owmership at zero too?
- spwa4 7 hours ago ago
  
  Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.
  So that gets you 1_892_160_000 tokens per year at full blast.
  If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)
  So yeah, not counting electricity and your time the machine "is worth it".
  [1] https://openrouter.ai/qwen/qwen3.6-27b/providers
jmyeet 8 hours ago ago

There's some mention of Apple silicon here but it's worth expanding upon. Macs have a unified memory architecture. So if you have a Mac with 64GB of memory then the GPU can use all of that. This is potentially quite useful but Apple silicon in general is limited by memory bandwidth. For comparison, a 5090 is 1792GB/s. Here are some examples:
- GMKTek EVO-X2: 120GB/s reads, 212GB/s writes
- NVidia DGX Spark 273GB/s
- Mac Mini M4 120GB/s but only $600+
- Mac Mini w/ M4 Pro 273GB/s ($2199 for 64GB)
- Mac Studio M4 Max 410GB/s ($3500 for 128GB)
- Mac Studio M3 Ultra 819GB/s ($5500 for 96GB)
- Macbook Pro 16" with M5 Pro 64GB 307GB/s ($3300)
- Macbook Pro 16" with M5 Max 128GB 460GB/s ($5399)
Sadly, Apple discontinued the 512GB Mac Studio. Mac Studios are a little long in the tooth now and due for an upgrade this year. I suspect that prices will be a lot higher given the RAM prices but we'll see.
croes 8 hours ago ago

> 128GB Ryzen AI MAX+ 395, listed at $2,099.
Wasn‘t that a discounted price?

[-]
- cowmix 8 hours ago ago
  
  I got mine almost exactly a year ago - $1699 direct from GMKTEK. To think it retails for 2X that, a year later, blows my mind.
jcgrillo 8 hours ago ago

I got a well used HP Z840 with 256GB ECC DDR4 and twin Xeons ca. 2014. Then I slapped 2 AMD V640 32GB passively cooled GPUs in it with some 3D printed fan shrouds and 2 1U 15k rpm fans each. They just fit! I needed to order a quad 8pin power cable, the standard configuration has 3 6pin cables--but there's unused pins on the GPU power rail, and there are aftermarket suppliers.
72 Xeon cores
256GB ECC DDR4
64GB VRAM
$2200 total
I run it on a 20A 240V outlet to make sure the power supply can deliver enough watts, but so far it's working pretty well. The eWaste LLM rig is probably not as good value for money as a new machine, but it gets the job done cheaper (for now).
EDIT: IIRC this approach gets me more VRAM bandwidth than Strix Halo at the cost of less addressable GBs (but a lot more total system RAM), but I figured with CPU offloading that might make up for it?
ALSO EDIT: Note you can get a 128GB Strix Halo motherboard minus power supply, fans, case, etc from Framework for $2200.. that could work if you have some parts lying around.
znpy 8 hours ago ago

As somebody that has a vague interest in running local LLMs… they day i decide to burn cash on hardware I might as well go all-in a get either a 128gb mac studio or an nvidia dgx spark (or some other equivalent gb10-based system).
The 64gb mac mini is also interesting, if anything because it is very likely to hold most of its value when reselling.
I’m keeping an eye on the next apple hardware refreshes, particularly for mac minis and mac studios.

[-]
- edot 8 hours ago ago
  
  I am in a similar boat to you, but I can’t make the money math work. Local LLMs obviously have a privacy benefit but DeepSeek V4 Flash (which you’ll struggle to get running on any single Mac - you’d need at least 128gb RAM) is $0.14$/mtok input $0.28/mtok output on the API. You’d have to be just absolutely burning tokens to ever make this make sense.
  Mac Studio M4 Max with 128gb at $3,699 (if you can find it) would equate to 10 million tokens a day of mixed input-output for over 5 years to break even. At which point that hardware is outdated compared to the SOTA models that will probably still be cheap on hosted platforms.
- amelius 8 hours ago ago
  
  The models are good enough now, so I'm waiting for the day they start selling inference ASICs with 100x the token output speed. See Taalas demo.
  
  [-]
  - adityamwagh 8 hours ago ago
    
    Taalas is a nice concept, but I don’t want to use the same model forever!
    
    [-]
    - amelius 8 hours ago ago
      
      Just buy a new one every few years, just like your phone and laptop. And sell the old one.
- 2ndorderthought 8 hours ago ago
  
  I just use my gaming pc. So I can play games or code with assistance for fun. It's awesome because it's mine and technically I can do whatever I want with it. Having a decent computer around and lower end laptops is pretty budget friendly.
- walthamstow 8 hours ago ago
  
  The 14inch Macbook Pros with 64GB are really good value considering it's a much more complicated machine than the Mini.
  
  [-]
  - touristtam 6 hours ago ago
    
    On M5 Pro that's still ~3k