Design an agent that tunes security scanners (SAST rules, allowlists) based on confirmed findings and false positives. Require approvals for any rule weakening.
Design a multi-repo agent that only edits repos it’s authorized for, respects codeowners, and uses per-repo policies. Include cross-repo dependency coordination rules.
Self-Improving Agent Evals: Add New Tests From Failures
Create a loop where production failures and near-misses become new eval tests. The agent should propose test additions with minimal reproductions and acceptance criteria.
Design a process to adjust safety filters based on measured false positive/negative rates. Require evaluation sets, human review, and rollback if harm risk rises.
Agent Governance: Approvals, Logs, and Periodic Audits
Design governance for self-improving systems: approval rules, quarterly audits, access reviews, and incident drills. Include “who can change the agent” controls.
Self-Improving Feature Discovery: Product Analytics to PRs
Create a pipeline where analytics identifies friction points and the agent proposes small UX or performance fixes. Require user impact estimation and safe rollout.
Self-Improving Error Messages: Reduce Support Tickets
Create an agent that rewrites error messages for clarity, adds error codes, and links to docs. Verify improvements via support ticket taxonomy and A/B testing.
Design a self-improving CLI agent that adds features while preserving backward compatibility, adds help text/tests, and uses semantic versioning rules.
Safe Self-Improvement for Internationalization Logic
Create a plan to safely modify locale/date/number formatting: add tests for edge locales, verify backward compatibility, and avoid breaking user settings.