Evaluation Rubric (Human-in-the-Loop)
Create a rubric for humans to score the model’s outputs: 5 criteria, 5-point scale each, behavioral anchors per score. Provide a one-page training note for raters and inter-rater reliability tips.
Ratings
Average Rating: 0
Total Ratings: 0