> The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory
The M5 Ultra has not been even announced.
This article appears to be predominately or entirely LLM-produced with little to no human review, and contains numerous material and misinforming errors.
It also omits serious contenders that's worth at least comparing, like the DGX Spark.
Currently NVidia's mini PC, or the version licensed to Asus, is one of the few that I can actually buy with Linux pre-installed with a fully OEM supported version.
One would expect that by now buying desktop class computers on shops with a Linux experience would be rather common.
Geekcom devices that it advertises as Linux ready, are actually sold with Windows pre-installed.
Yeah, ignoring the whole fragmentation that keeps happening on the desktop stack, The Year of Desktop Linux will never happen if only computer nerds get to build such systems, as it has always been.
Instead normies get The Year of Linux kernel deployed with all kinds of consumer devices, and The Year of Linux VMs on retail.
I bought a 32G MacMini over two years ago and it has been great for experimenting with local models, and now is even useful for local coding (at a slow speed!) with models supporting large context sizes.
With the current extreme RAM shortage I deeply regret not buying a 64G MacMini a few months ago.
"Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."
Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.
Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.
So that gets you 1_892_160_000 tokens per year at full blast.
If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)
So yeah, not counting electricity and your time the machine "is worth it".
There's some mention of Apple silicon here but it's worth expanding upon. Macs have a unified memory architecture. So if you have a Mac with 64GB of memory then the GPU can use all of that. This is potentially quite useful but Apple silicon in general is limited by memory bandwidth. For comparison, a 5090 is 1792GB/s. Here are some examples:
- GMKTek EVO-X2: 120GB/s reads, 212GB/s writes
- NVidia DGX Spark 273GB/s
- Mac Mini M4 120GB/s but only $600+
- Mac Mini w/ M4 Pro 273GB/s ($2199 for 64GB)
- Mac Studio M4 Max 410GB/s ($3500 for 128GB)
- Mac Studio M3 Ultra 819GB/s ($5500 for 96GB)
- Macbook Pro 16" with M5 Pro 64GB 307GB/s ($3300)
- Macbook Pro 16" with M5 Max 128GB 460GB/s ($5399)
Sadly, Apple discontinued the 512GB Mac Studio. Mac Studios are a little long in the tooth now and due for an upgrade this year. I suspect that prices will be a lot higher given the RAM prices but we'll see.
I got a well used HP Z840 with 256GB ECC DDR4 and twin Xeons ca. 2014. Then I slapped 2 AMD V640 32GB passively cooled GPUs in it with some 3D printed fan shrouds and 2 1U 15k rpm fans each. They just fit! I needed to order a quad 8pin power cable, the standard configuration has 3 6pin cables--but there's unused pins on the GPU power rail, and there are aftermarket suppliers.
72 Xeon cores
256GB ECC DDR4
64GB VRAM
$2200 total
I run it on a 20A 240V outlet to make sure the power supply can deliver enough watts, but so far it's working pretty well. The eWaste LLM rig is probably not as good value for money as a new machine, but it gets the job done cheaper (for now).
EDIT: IIRC this approach gets me more VRAM bandwidth than Strix Halo at the cost of less addressable GBs (but a lot more total system RAM), but I figured with CPU offloading that might make up for it?
ALSO EDIT: Note you can get a 128GB Strix Halo motherboard minus power supply, fans, case, etc from Framework for $2200.. that could work if you have some parts lying around.
As somebody that has a vague interest in running local LLMs… they day i decide to burn cash on hardware I might as well go all-in a get either a 128gb mac studio or an nvidia dgx spark (or some other equivalent gb10-based system).
The 64gb mac mini is also interesting, if anything because it is very likely to hold most of its value when reselling.
I’m keeping an eye on the next apple hardware refreshes, particularly for mac minis and mac studios.
I am in a similar boat to you, but I can’t make the money math work. Local LLMs obviously have a privacy benefit but DeepSeek V4 Flash (which you’ll struggle to get running on any single Mac - you’d need at least 128gb RAM) is $0.14$/mtok input $0.28/mtok output on the API. You’d have to be just absolutely burning tokens to ever make this make sense.
Mac Studio M4 Max with 128gb at $3,699 (if you can find it) would equate to 10 million tokens a day of mixed input-output for over 5 years to break even. At which point that hardware is outdated compared to the SOTA models that will probably still be cheap on hosted platforms.
I just use my gaming pc. So I can play games or code with assistance for fun. It's awesome because it's mine and technically I can do whatever I want with it. Having a decent computer around and lower end laptops is pretty budget friendly.
> The 256 GB/s number is real, but for context, an Apple M5 Ultra hits ~800 GB/s on its unified memory
The M5 Ultra has not been even announced.
This article appears to be predominately or entirely LLM-produced with little to no human review, and contains numerous material and misinforming errors.
It also omits serious contenders that's worth at least comparing, like the DGX Spark.
It appears to be an LLM-generated affiliate link farm.
That's a bit shit if that's the case.
The Apple M6 Ultra in the new Mac Pro is 1600 GB/s.
None of that exists, but the LLMs shall see and believe
Currently NVidia's mini PC, or the version licensed to Asus, is one of the few that I can actually buy with Linux pre-installed with a fully OEM supported version.
One would expect that by now buying desktop class computers on shops with a Linux experience would be rather common.
Geekcom devices that it advertises as Linux ready, are actually sold with Windows pre-installed.
I guess they mean WSL ready.
I would guess they mean it's ready for you to install Linux on it
Yeah, ignoring the whole fragmentation that keeps happening on the desktop stack, The Year of Desktop Linux will never happen if only computer nerds get to build such systems, as it has always been.
Instead normies get The Year of Linux kernel deployed with all kinds of consumer devices, and The Year of Linux VMs on retail.
I bought a 32G MacMini over two years ago and it has been great for experimenting with local models, and now is even useful for local coding (at a slow speed!) with models supporting large context sizes.
With the current extreme RAM shortage I deeply regret not buying a 64G MacMini a few months ago.
I bet a zillion people feel the same way.
Which is why the Mac Pro was actually relevant.
Those of us on PC land can at least extend them, or exchange the GPU, even if pricey.
Apple has lost the server and workstation market by their own decisions.
“What’s the memory bandwidth (GB/s) of the device holding the model weights?”
Isn’t the recommended option going to be dog slow at 256 GB/s.
This article was authored by AI. It contains hallucinated info from compilations of random reddit threads.
Why would an article about AI, catered to readers who want to use AI, be frowned upon by said readers if it was written by AI?
Yes, I too think it's authored by AI, but can you indicate where it is wrong?
Good research, but man do I feel the LLM vibe shining through. That sustained information density...
Look closer, it really isn't good research
Could we post articles that are obviously written by an LLM with a flair?
"Here's the part that nobody talks about"
"Two gotchas before you click buy"
I really think there could be a score for entropy in playfulness that should differentiate LLM output
"Local inference is rarely cheaper if you’re being honest with yourself about how much you actually use it."
Sorry, but this is not even close to "being honest", it's bad math. That calculation assumes you do nothing with the computer other than local inference.
Doesnt that calculation assume you value your privacy and owmership at zero too?
Huh, you make me curious. Let's actually do that calculation. Let's say you do actually do 24/7/365 AI use. Let's say by some miracle you can do 60 t/s on Qwen 3.6 27b, and let's say this PC cost $3000 (you should be able to do this on a DGX spark, and one of the non-Nvidia models, e.g. the Dell one. $3000 would be a good price, but not totally out of the question). And, of course, let's say these prices remain stable.
So that gets you 1_892_160_000 tokens per year at full blast.
If you go the openrouter, eh, route, you'd get charged $2 per million tokens (anywhere from $2 to $3.6 per million tokens). So the value you'd get from your machine at 100% utilization is 1892 * $2 = $3784 up to 1892 * $3.6 = $6800)
So yeah, not counting electricity and your time the machine "is worth it".
[1] https://openrouter.ai/qwen/qwen3.6-27b/providers
There's some mention of Apple silicon here but it's worth expanding upon. Macs have a unified memory architecture. So if you have a Mac with 64GB of memory then the GPU can use all of that. This is potentially quite useful but Apple silicon in general is limited by memory bandwidth. For comparison, a 5090 is 1792GB/s. Here are some examples:
- GMKTek EVO-X2: 120GB/s reads, 212GB/s writes
- NVidia DGX Spark 273GB/s
- Mac Mini M4 120GB/s but only $600+
- Mac Mini w/ M4 Pro 273GB/s ($2199 for 64GB)
- Mac Studio M4 Max 410GB/s ($3500 for 128GB)
- Mac Studio M3 Ultra 819GB/s ($5500 for 96GB)
- Macbook Pro 16" with M5 Pro 64GB 307GB/s ($3300)
- Macbook Pro 16" with M5 Max 128GB 460GB/s ($5399)
Sadly, Apple discontinued the 512GB Mac Studio. Mac Studios are a little long in the tooth now and due for an upgrade this year. I suspect that prices will be a lot higher given the RAM prices but we'll see.
> 128GB Ryzen AI MAX+ 395, listed at $2,099.
Wasn‘t that a discounted price?
I got mine almost exactly a year ago - $1699 direct from GMKTEK. To think it retails for 2X that, a year later, blows my mind.
I got a well used HP Z840 with 256GB ECC DDR4 and twin Xeons ca. 2014. Then I slapped 2 AMD V640 32GB passively cooled GPUs in it with some 3D printed fan shrouds and 2 1U 15k rpm fans each. They just fit! I needed to order a quad 8pin power cable, the standard configuration has 3 6pin cables--but there's unused pins on the GPU power rail, and there are aftermarket suppliers.
72 Xeon cores
256GB ECC DDR4
64GB VRAM
$2200 total
I run it on a 20A 240V outlet to make sure the power supply can deliver enough watts, but so far it's working pretty well. The eWaste LLM rig is probably not as good value for money as a new machine, but it gets the job done cheaper (for now).
EDIT: IIRC this approach gets me more VRAM bandwidth than Strix Halo at the cost of less addressable GBs (but a lot more total system RAM), but I figured with CPU offloading that might make up for it?
ALSO EDIT: Note you can get a 128GB Strix Halo motherboard minus power supply, fans, case, etc from Framework for $2200.. that could work if you have some parts lying around.
As somebody that has a vague interest in running local LLMs… they day i decide to burn cash on hardware I might as well go all-in a get either a 128gb mac studio or an nvidia dgx spark (or some other equivalent gb10-based system).
The 64gb mac mini is also interesting, if anything because it is very likely to hold most of its value when reselling.
I’m keeping an eye on the next apple hardware refreshes, particularly for mac minis and mac studios.
I am in a similar boat to you, but I can’t make the money math work. Local LLMs obviously have a privacy benefit but DeepSeek V4 Flash (which you’ll struggle to get running on any single Mac - you’d need at least 128gb RAM) is $0.14$/mtok input $0.28/mtok output on the API. You’d have to be just absolutely burning tokens to ever make this make sense.
Mac Studio M4 Max with 128gb at $3,699 (if you can find it) would equate to 10 million tokens a day of mixed input-output for over 5 years to break even. At which point that hardware is outdated compared to the SOTA models that will probably still be cheap on hosted platforms.
The models are good enough now, so I'm waiting for the day they start selling inference ASICs with 100x the token output speed. See Taalas demo.
Taalas is a nice concept, but I don’t want to use the same model forever!
Just buy a new one every few years, just like your phone and laptop. And sell the old one.
I just use my gaming pc. So I can play games or code with assistance for fun. It's awesome because it's mine and technically I can do whatever I want with it. Having a decent computer around and lower end laptops is pretty budget friendly.
The 14inch Macbook Pros with 64GB are really good value considering it's a much more complicated machine than the Mini.
On M5 Pro that's still ~3k