Show HN: BrokenClaw Part 5: GPT-5.4 Edition (Prompt Injection)

(veganmosfet.codeberg.page)

9 points | by veganmosfet a day ago ago

2 comments

  • feznyng a day ago ago

    This is cool stuff, have you considered submitting any of these exploits to https://hackmyclaw.com/? Email being the only allowed injection vector might be tricky though.

    • veganmosfet 15 hours ago ago

      Thanks!

      I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the user intent (e.g., "summarize my emails") is not known - this is critical for the prompt injection payload. I also suspect that batch processing of "malicious" emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That's why I always start my experiments with a fresh context. Moreover, "hacking" the VPS is not allowed.

      Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying "OpenClaw is secure against prompt injection" because nobody was able to solve the challenge - it's not.