Search Results
Showing results for "regression"
No image available
Self-Improving Safety Regression Suite
Create a safety regression suite for tool-using agents: prompt injection tests, permission misuse tests, and data leakage tests. Gate deployments on this suite.
Tags:
safety,
regression-suite,
tooling,
leak-tests,
permissions
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Regression Detective Agent Using Golden Tests
Design golden tests for key outputs and an agent that compares before/after behavior. Include tolerance rules and how to prevent “golden drift” over time.
Tags:
golden-tests,
regression,
behavioral-testing,
drift
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Evidence-Driven Refactoring: Metrics Before Changes
Build a refactoring agent that only changes code when it can show measurable gains (latency, memory, error rate). Include baseline capture and regression detection.
Tags:
refactoring,
metrics,
performance,
regression,
CI
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Self-Improving Benchmark Suite
Design an agent that expands benchmarks as new features land: add workloads, track performance trends, and alert on regressions. Include benchmark governance.
Tags:
benchmarks,
performance-trends,
regression-alerts,
governance
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Bug Reproduction Agent: Minimize Steps to Reproduce
Create an agent that turns bug reports into minimal reproducible cases, adds regression tests, and proposes fixes. Include a triage rubric and severity mapping.
Tags:
bugfix,
repro,
minimization,
regression-tests,
triage
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Memory Leak Hunter: Reproduce, Fix, Verify
Design a loop to detect memory leaks: reproduce with stress tests, capture heap snapshots, propose fix, and verify stability. Include regression tests.
Tags:
memory-leak,
debugging,
stress-tests,
verification,
CI
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Refactor Safety Net: Characterization Tests
Design characterization tests for legacy code before refactoring: capture I/O behavior, edge cases, and performance baselines. Use them as a gate for changes.
Tags:
legacy,
characterization-tests,
refactoring,
regression
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Safety Regression Suite (What Must Never Break)
Create a safety regression suite: prompt injection tests, data leakage tests, refusal/guardrail tests, and policy adherence checks. Include how to maintain and evolve the suite over time.
Tags:
safety-regression,
testing,
prompt-injection,
privacy,
guardrails
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Prompt Injection Defense Plan (Tool-Using Agents)
Design defenses against prompt injection for tool-using agents: content provenance, allowlists, tool policy, and sandboxing. Include a suite of adversarial prompts for regression testing.
Tags:
prompt-injection,
agents,
tooling,
security,
testing
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Prompt Injection in Retrieved Pages: Sanitization Plan
Design a sanitization pipeline for retrieved content: strip instructions, isolate quotes, and prevent tool-use hijacks. Include adversarial test cases and regression suite.
Tags:
prompt-injection,
sanitization,
security,
RAG,
testing
Author: Assistant
Category: research-bot | Model: GPT-5.2
No image available
Hallucination Reduction Plan (RAG + Verification)
Design a hallucination reduction plan: retrieval augmentation, answer verification steps, consistency checks, and refusal behaviors. Include evaluation metrics and regression tests.
Tags:
hallucination,
RAG,
verification,
consistency,
testing
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Post-Mortem Template for AI Regressions
Create a post-mortem template tailored to AI regressions: data/prompt/model diffs, evaluation gaps, monitoring misses, and remediation tasks. Include a ‘lessons to tests’ section.
Tags:
postmortem,
regression,
ops,
testing,
remediation
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Agent Runbooks: On-Call Playbook for Failures
Create operational runbooks: common failures, triage steps, rollback, and user comms. Include SLO breaches, tool outages, and prompt regressions.
Tags:
runbooks,
ops,
on-call,
incident-response,
reliability
Author: Assistant
Category: agent-architecture | Model: GPT-5.2
No image available
Guardrails Layering: Policy + Technical Controls
Design layered guardrails: policy rules, tool allowlists, output filters, and human escalation. Include regression tests to prevent guardrail drift during iterations.
Tags:
guardrails,
policy,
allowlists,
regression-tests,
safety
Author: Assistant
Category: agent-architecture | Model: GPT-5.2
No image available
Lint + CDC + Formal in CI: Practical Pipeline
Design a CI pipeline for hardware: lint rules, CDC checks, reset checks, basic formal proofs, and regression simulation tiers. Include pass/fail gates and artifact retention for debug.
Tags:
CI,
lint,
CDC,
formal,
regression,
EDA
Author: Assistant
Category: fpga-asic-design | Model: gpt-4o
No image available
LLMOps 2026: Evaluation-First Operating System
Create an eval-first LLMOps design: golden sets, adversarial tests, continuous regression, cost/latency tracking, and release gates. Include a ‘model change control’ policy.
Tags:
LLMOps,
evaluation,
guardrails,
regression,
change-control
Author: Assistant
Category: ai-strategy-2026 | Model: gpt-4o
No image available
Performance Regression Hunter
Define performance budgets and benchmarks; ChatGPT proposes profiling plan; Cursor instruments code (e.g., pprof/Flamegraphs); Antigravity runs load tests, flags regressions, and files targeted PRs. O...
Tags:
performance,
profiling,
benchmarks,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: performance-engineering | Model: gpt-4o
No image available
Attribution-Aware Eval Harness
Build an eval that scores ground-truth attribution (exact passage match), answer faithfulness, and coverage. Provide dataset schema and a nightly regression plan.
Tags:
evaluation,
attribution,
faithfulness,
coverage,
datasets
Author: Assistant
Category: evaluation-frameworks | Model: gpt-4o
No image available
LLM Prompt Registry & Eval Harness
ChatGPT drafts prompts and adversarial tests; Cursor integrates an eval harness; Antigravity schedules nightly evals and posts regressions with diffs. Include versioning and approval flow.
Tags:
LLM,
prompts,
evaluation,
registry,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: mlops-llm-quality | Model: gpt-4o
No image available
Spec → Tests → Code Chain
Given a product brief, have ChatGPT write acceptance criteria and test outlines, ask Cursor to generate unit/integration tests and the minimal code, then let Antigravity agents execute the suite and o...
Tags:
TDD,
testing,
ChatGPT,
Cursor,
Antigravity,
automation
Author: Assistant
Category: quality-engineering | Model: gpt-4o
No image available
Meta-Analysis with Bias Diagnostics
Act as a meta-analyst. Provide a DerSimonian–Laird vs REML plan, heterogeneity measures, subgroup/meta-regression, and publication bias checks (funnel/Egger/trim-and-fill).
Tags:
meta-analysis,
heterogeneity,
bias,
REML,
PRISMA
Author: Assistant
Category: biostats-methods | Model: gpt-5
No image available
Timing ECO Cookbook (Minimal PPA Hit)
Provide a timing ECO guide: buffer sizing rules, cell Vt swaps, re-route constraints, shielding for aggressors, hold-fix ordering, and checks to avoid IR/EM regression. Include a before/after metrics ...
Tags:
IC,
timing,
ECO,
SI,
cell-sizing,
Vt-swap
Author: Assistant
Category: chip-design | Model: gpt-4
No image available
Automated Signoff and ECO Loop
Define a push-button signoff pipeline: reproducible EDA containers, golden rule decks, regression checklists, run orchestration, artifact retention, and ECO automation for timing/IR/DRC. Output a CI/C...
Tags:
IC,
signoff,
automation,
ECO,
CI/CD,
flows
Author: Assistant
Category: chip-design | Model: gpt-4
No image available
Robotics Dev Co.: Sim-in-the-Loop CI Pipeline
You are a robotics CI lead. Design a sim-in-the-loop pipeline for perception + planning stacks. Deliver: scenario library spec (lighting, occlusion, rare edge cases), metrics (success %, collision=0, ...
Tags:
robotics,
simulation,
CI/CD,
testing,
telemetry,
metrics
Author: Tsubasa Kato
Category: Engineering | Model: GPT-5 Thinking
Back to Home