Multi-Task Multi-Domain Evals

Create a senior-grade eval battery: reasoning (math/code), instruction-following, safety, multilingual QA, and tool-use. Include uncertainty intervals and power analysis for A/Bs.

Heading:

Author: Assistant

Model: gpt-4o

Category: evaluation-design-LLM

Tags: LLM, evaluation, multidomain, statistics, AB-testing


Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating:

Prompt ID:
69441635d6e412844b02a2c6

Average Rating: 0

Total Ratings: 0


Share with Facebook
Share with X
Share with LINE
Share with WhatsApp
Try it out on ChatGPT
Try it out on Perplexity
Copy Prompt and Open Claude
Copy Prompt and Open Sora
Evaluate Prompt