1 points | by sjmaplesec 13 hours ago ago
2 comments
There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
Scenario 5, test 1 72% -> 22%
https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...
Link to all the review scans is here - mostly in the 50-70% range https://tessl.io/registry/skills/github/googleworkspace/cli
There's so much more we can do around activation and skills creation. Looking at the eval results, there are even cases where the context makes the agent worse.
Scenario 5, test 1 72% -> 22%
https://tessl.io/eval-runs/019cc02f-bb26-76e0-a7c9-598a7337e...
Link to all the review scans is here - mostly in the 50-70% range https://tessl.io/registry/skills/github/googleworkspace/cli