Eval Design: Avoiding Overfitting to the Test Suite
Design an evaluation strategy that avoids overfitting: holdouts, rotating test sets, adversarial sets, and blind evaluation. Include rules for when to refresh benchmarks.
Ratings
Average Rating: 0
Total Ratings: 0