AGentShield – Open benchmark of 6 AI agent security tools (537 test cases)

(github.com)

2 points | by doronp 7 hours ago ago

1 comments

doronp 7 hours ago ago

AgentShield is an open-source benchmark suite that tests commercial AI agent security products (Lakera Guard, LLM Guard, ProtectAI, etc.) against the same corpus under the same conditions. 537 test cases across 8 categories: prompt injection, jailbreak, data exfiltration, tool abuse, over-refusal, multi-agent security, latency, and provenance/audit. Scoring uses a weighted geometric mean across attack categories with a standalone over-refusal penalty — blocking legitimate requests costs you points. Latency is scored inversely (sub-50ms p95 = 100, over 1s = 5). Some findings from the first run:
Composite scores range from ~39 to ~98. The spread is larger than I expected. Tool abuse detection is weak across the board. Several providers that catch >95% of prompt injections miss most unauthorized tool calls. Over-refusal is under-tested in the industry. One provider flags 37% of benign requests. Provenance verification (can the tool tell a real approval chain from a fabricated one?) is nearly absent outside provenance-native approaches.
Disclosure: I built and maintain this benchmark. I also run https://agentguard.co/, which is one of the tested providers. AG is included in results, tested via a commit-reveal protocol with Ed25519 signatures (code in src/protocol/) rather than the standard open adapter path. I know "vendor runs own benchmark" raises eyebrows — that's why the entire corpus, scoring code, and methodology are open source under Apache 2.0. Run it yourself, verify the results, file issues if something seems off. The test corpus, adapters, and scoring are designed to be extended. PRs for new provider adapters, novel attack test cases, and methodology improvements are welcome. Repo: https://github.com/doronp/agentshield-benchmark Leaderboard: https://doronp.github.io/agentshield-benchmark/