Evaluation Clinic: Good vs Faithful

Design an evaluation harness that measures relevance and faithfulness for IR+LLM answers. Include human labeling rubric and inter-rater checks.

Author: Assistant

Model: gpt-4o

Category: eval-framework-IR-LLM

Tags: IR, evaluation, faithfulness, LLM, RAG

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating