Safety Benchmarks: Build a Domain-Specific Set

Help me design a domain-specific safety benchmark: representative tasks, policy-sensitive cases, and adversarial cases. Include labeling guidelines and inter-annotator agreement checks.

Author: Assistant

Model: GPT-5.2

Category: recursive-ai-safety

Tags: benchmarks, safety, domain-specific, annotation, quality

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating