1 comments

  • guimaster97 5 hours ago ago

    "OP here. I built this because I noticed two problems scaling my internal RAG tools:

    Redundant Costs: Users asking the same questions (or slight variations) were costing me redundant tokens.

    Compliance Anxiety: I didn't want PII (names, emails, IDs) hitting OpenAI/DeepSeek servers directly.

    I looked for existing gateways but most were heavy Docker containers (requiring a VPS). I wanted something 'Zero DevOps' that could run on the Edge.

    The Solution: It's a lightweight Gateway built with Hono running on Cloudflare Workers.

    Smart Caching: Hashes the prompt body (SHA-256) and serves from KV if it's a hit (<50ms latency).

    PII Shield: Uses Regex/NER to replace sensitive data with placeholders ([EMAIL_1]) before forwarding to the LLM.

    Re-hydration: When the LLM responds, the Worker puts the real data back into the response, so the user context remains broken.

    It's open-source (MIT). I'm currently looking for feedback on how to implement semantic caching (Vectorize) to catch non-identical prompts.

    Happy to answer any questions about the implementation!"