LLMCode Lab – Compare up to 5 LLMs side-by-side, then fuse the best answers

(llmcode.ai)

2 points | by cmeshare 7 hours ago ago

2 comments

cmeshare 7 hours ago ago

I've just been trying to figure out a better way to evaluate LLMs without 10 tabs open. So I built LLMCode Lab, an open-source tool that lets you compare up to 5 models side by side from 30+ available, then fuse their responses into a single synthesized answer.
Perplexity just launched Model Council — multiple models collaborating on answers, but it's locked behind their $200/month Max plan ($2,000/year) and they pick which models run. The Pro plan will come later and be capped. LLMCode Lab gives you this with your own API keys, where you choose every model in the pipeline, and a typical fusion run costs a fraction of a cent.
How it works:
Standard Lab: Pick up to 5 models, enter a prompt, and compare responses side-by-side. Fusion Lab takes your completed results, you choose a synthesizer model, and it combines them into one answer. You decide which models compete. You decide which model to use for the fusion. Full control, no black box.
Funded and Free API Keys:
Free: HuggingFace token (Llama, Qwen, Gemma) + Google free tier (Gemini 2.5 Flash, Flash-Lite, Gemini 3 Flash) Funded: OpenAI (GPT-5, o3), Anthropic (Claude Opus 4.6, Sonnet 4.5), Google paid (Gemini 3 Pro) The comparison step can run entirely on free models. Fusion requires a funded model as the synthesizer — but you're paying per-query at API rates, not $200/month for a subscription.
Privacy: Your API keys are stored in your browser (localStorage), auto-clear after 24 hours, with no cookies or backend storage.
No sign-up. No paywall. Bring your own keys.
beernet 7 hours ago ago

So how many more LLM-generated "products" do we need to see here? This is inconsistent, with contradictory information, partly blatantly wrong and overall horrible from start to finish.