Deterministic Web-Agent Evaluation
Tinyeval
Versioned workflow cases, deterministic evaluation rules, and a single report API for web agents. The public API lives on api.tinyeval.ai, API-key management lives on console.tinyeval.ai, and operator status lives on admin.tinyeval.ai.
Case Registry
Fetch random or filtered workflows, pin case versions, and resolve stubbed TinyContext bindings.
Single Hook
Agents report intermediate and final state through one endpoint and get checkpoint-level verdicts back.
Deterministic Scoring
No model in the scoring path. Rules evaluate URLs, page state, output schema, and ground-truth extracts.