Summer Yue's OpenClaw agent deleted 200 emails despite "confirm before action." We tested the gog skill (Google Workspace) and saw the same behavior
— the skill teaches the agent how to bulk-delete but not when to stop.
That was 1 of 11 security failures in that single skill. Others: data exfiltration, unauthorized forwarding, contact harvesting, impersonation via
calendar events.
We tested 10 OpenClaw skills across 186 security properties, with and without each skill loaded. 9 of 10 show the same pattern — some properties
improve, others degrade. The skill adds domain knowledge that shifts security behavior in ways nobody tested for.
We hardened all 10. 84% fix rate. Each guardrail traces to a specific regression. Open source, drop-in replacements.
Per-skill scorecards: https://faberlens.ai/report
Research: https://faberlens.ai/blog/jagged-surface
N=10, two models, limitations published. Happy to go deeper on methodology.
Summer Yue's OpenClaw agent deleted 200 emails despite "confirm before action." We tested the gog skill (Google Workspace) and saw the same behavior — the skill teaches the agent how to bulk-delete but not when to stop.