Self-Improving Agent Evals: Add New Tests From Failures
Create a loop where production failures and near-misses become new eval tests. The agent should propose test additions with minimal reproductions and acceptance criteria.
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2