Self-Improving Agent Evals: Add New Tests From Failures
Create a loop where production failures and near-misses become new eval tests. The agent should propose test additions with minimal reproductions and acceptance criteria.
Ratings
Average Rating: 0
Total Ratings: 0