Show HN: CLaaS – Update your local LLM's weights in real time from text feedback

(github.com)

5 points | by kfallah 5 hours ago ago

2 comments

matiszz 29 minutes ago ago

Cool project!
kfallah 5 hours ago ago

CLaaS is an open-source system that uses self-distillation to move feedback from context into model weights. Current approaches rely on system prompts and memory to personalize your model, but every token spent reminding is a token your model can't use for the actual task. Instead, with every piece of feedback, CLaaS triggers a weight update while avoiding the catastrophic forgetting you get with standard fine-tuning. The updated LoRA adapter hot-reloads into vLLM, so your next response comes from a better model.
Right now it runs on a single consumer GPU (tested on RTX 5090) with Qwen3-8B. Easy to set up with Docker Compose alongside a locally hosted OpenClaw, but the API works with any local model.