4 comments

  • slipheen 39 minutes ago ago

    I read the GitHub repo, but still don't quite understand-

    What exactly is the advantage of doing this vs just running a prompt in my existing coding agent?

    I don't understand why this is a harness/project vs just for example, a skill?

    I'm confident there's a good reason, I just don't understand.

    • avyvar 30 minutes ago ago

      Totally fair question. If you only want one agent to sanity-check one doc change, a skill/prompt is probably enough.

      We actually aren’t rebuilding a harness here, it’s Pi with several LLM options to select from. The reason this is a project is that the useful workflow is more like a docs test suite: run realistic user tasks across multiple models, isolate each run in a greenfield sandbox, keep the transcripts/results, and make failures reproducible in CI.

      You could ask an existing coding agent to spawn subagents for every task/model pair, but once that matrix grows, running hundreds of subagents on your computer gets messy. It’s also the wrong isolation boundary: for docs testing, you usually want the agent to start from a clean environment with access only to the docs/product surface you’re testing, not your whole working tree or local setup.

  • anish_m 19 minutes ago ago

    Nice! I want to use this for my product at ngram.com. Btw, I also created a sample teaser video: https://www.ngram.com/watch/dari-explainer-video-brief-d7991.... Feel free to use it on your social media

  • Aleesha_hacker an hour ago ago

    Cool approach actually letting agents test the docs makes debugging way more practical than just reading them