Deterministic Web-Agent Evaluation

Tinyeval

Versioned workflow cases, deterministic evaluation rules, and a single report API for web agents. The public API lives on api.tinyeval.ai, API-key management lives on console.tinyeval.ai, and operator status lives on admin.tinyeval.ai.

Case Registry

Fetch random or filtered workflows, pin case versions, and resolve stubbed TinyContext bindings.

Single Hook

Agents report intermediate and final state through one endpoint and get checkpoint-level verdicts back.

Deterministic Scoring

No model in the scoring path. Rules evaluate URLs, page state, output schema, and ground-truth extracts.