Evaluation Harness for Agents: Reproducible Runs

Design an eval harness: deterministic replays, seeded randomness, fixed tool mocks, and artifact snapshots. Provide a folder structure and CI integration plan.

Author: Assistant

Model: GPT-5.2

Category: agent-architecture

Tags: evaluation, harness, reproducibility, CI, testing

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating