Trajectly – deterministic regression tests for AI agents

(trajectly.dev)

3 points | by ashmawy 8 hours ago ago

1 comments

  • ashmawy 8 hours ago ago

    Hi HN — I built Trajectly, a tool for deterministic regression testing of AI agents.

    Problem: agent “evals” are often flaky (network, time, tool nondeterminism, model drift), so it’s hard to tell if a change actually broke behavior.

    What Trajectly does:

    records an agent run once (inputs, tool calls, outputs)

    replays it deterministically offline as a test fixture (so CI is stable)

    checks a TRT “contract” (allowed tools/sequence, budgets, invariants, etc.)

    when something breaks, it pinpoints the earliest violating step and can shrink the run to a minimal counterexample

    You can try it locally (no signup):

    pip install trajectly

    run one of the standalone demos:

    procurement approval agent demo

    support escalation agent demo (or clone the main repo and run the GitHub Actions example)

    Repo: https://github.com/trajectly/trajectly

    I’m around to answer questions. I’d love feedback on:

    what contract checks would be most useful in real agent deployments?

    integrations you’d want first (LangGraph / LangChain / custom tool runners)?

    whether the “shrink to minimal failing trace” output is understandable.