Self-Improving Safety Filters: Measure False Positives/Negatives
Design a process to adjust safety filters based on measured false positive/negative rates. Require evaluation sets, human review, and rollback if harm risk rises.
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2