2 points | by wonderwhyer 7 hours ago ago
1 comments
The debate around local vs API vs subscriptions feels mostly anecdotal. I tried building a tool that compares them using “quality-adjusted tokens per dollar.”
The idea:
Tokens per dollar
Weighted input/output pricing (75/25 assumption)
Benchmark-normalized quality (Arena, Aider, SWE-bench)
Early results surprised me (local often loses economically unless privacy is heavily valued).
I’m mostly looking for critique of the methodology:
Is quality-adjusted tokens per dollar even the right metric?
Is normalizing ELO to % defensible?
What benchmarks am I missing?
The debate around local vs API vs subscriptions feels mostly anecdotal. I tried building a tool that compares them using “quality-adjusted tokens per dollar.”
The idea:
Tokens per dollar
Weighted input/output pricing (75/25 assumption)
Benchmark-normalized quality (Arena, Aider, SWE-bench)
Early results surprised me (local often loses economically unless privacy is heavily valued).
I’m mostly looking for critique of the methodology:
Is quality-adjusted tokens per dollar even the right metric?
Is normalizing ELO to % defensible?
What benchmarks am I missing?