2 comments

  • matiszz 29 minutes ago ago

    Cool project!

  • kfallah 5 hours ago ago

    CLaaS is an open-source system that uses self-distillation to move feedback from context into model weights. Current approaches rely on system prompts and memory to personalize your model, but every token spent reminding is a token your model can't use for the actual task. Instead, with every piece of feedback, CLaaS triggers a weight update while avoiding the catastrophic forgetting you get with standard fine-tuning. The updated LoRA adapter hot-reloads into vLLM, so your next response comes from a better model.

    Right now it runs on a single consumer GPU (tested on RTX 5090) with Qwen3-8B. Easy to set up with Docker Compose alongside a locally hosted OpenClaw, but the API works with any local model.