Evaluation Clinic: Good vs Faithful
Design an evaluation harness that measures relevance and faithfulness for IR+LLM answers. Include human labeling rubric and inter-rater checks.
Ratings
Average Rating: 0
Total Ratings: 0
Average Rating: 0
Total Ratings: 0