I use Claude and Codex heavily for coding, and I kept burning through my quota halfway through the week. When I looked at my logs, most of my prompts were things like "summarize this" or "reformat this JSON." Stuff any small model handles fine.
NadirClaw is a Python proxy that classifies each prompt in ~10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap model you want. Only complex prompts hit your premium API. It's OpenAI-compatible, so you just point your existing tools at it.
In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.
pip install nadirclaw
Still early. The classifier is simple (token count + pattern matching + optional embeddings). Curious what breaks first.
I use Claude and Codex heavily for coding, and I kept burning through my quota halfway through the week. When I looked at my logs, most of my prompts were things like "summarize this" or "reformat this JSON." Stuff any small model handles fine.
NadirClaw is a Python proxy that classifies each prompt in ~10ms and routes simple ones to Gemini Flash, Ollama, or whatever cheap model you want. Only complex prompts hit your premium API. It's OpenAI-compatible, so you just point your existing tools at it.
In practice I went from burning through my Claude quota in 2 days to having it last the full week. Costs dropped around 60%.
pip install nadirclaw
Still early. The classifier is simple (token count + pattern matching + optional embeddings). Curious what breaks first.