Search Results - Curioprompt

No image available

Self-Improving Safety Regression Suite

Create a safety regression suite for tool-using agents: prompt injection tests, permission misuse tests, and data leakage tests. Gate deployments on this suite.

Tags: safety, regression-suite, tooling, leak-tests, permissions

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Regression Detective Agent Using Golden Tests

Design golden tests for key outputs and an agent that compares before/after behavior. Include tolerance rules and how to prevent “golden drift” over time.

Tags: golden-tests, regression, behavioral-testing, drift

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Evidence-Driven Refactoring: Metrics Before Changes

Build a refactoring agent that only changes code when it can show measurable gains (latency, memory, error rate). Include baseline capture and regression detection.

Tags: refactoring, metrics, performance, regression, CI

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Self-Improving Benchmark Suite

Design an agent that expands benchmarks as new features land: add workloads, track performance trends, and alert on regressions. Include benchmark governance.

Tags: benchmarks, performance-trends, regression-alerts, governance

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Bug Reproduction Agent: Minimize Steps to Reproduce

Create an agent that turns bug reports into minimal reproducible cases, adds regression tests, and proposes fixes. Include a triage rubric and severity mapping.

Tags: bugfix, repro, minimization, regression-tests, triage

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Memory Leak Hunter: Reproduce, Fix, Verify

Design a loop to detect memory leaks: reproduce with stress tests, capture heap snapshots, propose fix, and verify stability. Include regression tests.

Tags: memory-leak, debugging, stress-tests, verification, CI

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Refactor Safety Net: Characterization Tests

Design characterization tests for legacy code before refactoring: capture I/O behavior, edge cases, and performance baselines. Use them as a gate for changes.

Tags: legacy, characterization-tests, refactoring, regression

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Safety Regression Suite (What Must Never Break)

Create a safety regression suite: prompt injection tests, data leakage tests, refusal/guardrail tests, and policy adherence checks. Include how to maintain and evolve the suite over time.

Tags: safety-regression, testing, prompt-injection, privacy, guardrails

Author: Assistant

Category: recursive-ai-safety | Model: GPT-5.2

No image available

Prompt Injection Defense Plan (Tool-Using Agents)

Design defenses against prompt injection for tool-using agents: content provenance, allowlists, tool policy, and sandboxing. Include a suite of adversarial prompts for regression testing.

Tags: prompt-injection, agents, tooling, security, testing

Author: Assistant

Category: recursive-ai-safety | Model: GPT-5.2

No image available

Prompt Injection in Retrieved Pages: Sanitization Plan

Design a sanitization pipeline for retrieved content: strip instructions, isolate quotes, and prevent tool-use hijacks. Include adversarial test cases and regression suite.

Tags: prompt-injection, sanitization, security, RAG, testing

Author: Assistant

Category: research-bot | Model: GPT-5.2

No image available

Hallucination Reduction Plan (RAG + Verification)

Design a hallucination reduction plan: retrieval augmentation, answer verification steps, consistency checks, and refusal behaviors. Include evaluation metrics and regression tests.

Tags: hallucination, RAG, verification, consistency, testing

Author: Assistant

Category: recursive-ai-safety | Model: GPT-5.2

No image available

Post-Mortem Template for AI Regressions

Create a post-mortem template tailored to AI regressions: data/prompt/model diffs, evaluation gaps, monitoring misses, and remediation tasks. Include a ‘lessons to tests’ section.

Tags: postmortem, regression, ops, testing, remediation

Author: Assistant

Category: recursive-ai-safety | Model: GPT-5.2

No image available

Agent Runbooks: On-Call Playbook for Failures

Create operational runbooks: common failures, triage steps, rollback, and user comms. Include SLO breaches, tool outages, and prompt regressions.

Tags: runbooks, ops, on-call, incident-response, reliability

Author: Assistant

Category: agent-architecture | Model: GPT-5.2

No image available

Guardrails Layering: Policy + Technical Controls

Design layered guardrails: policy rules, tool allowlists, output filters, and human escalation. Include regression tests to prevent guardrail drift during iterations.

Tags: guardrails, policy, allowlists, regression-tests, safety

Author: Assistant

Category: agent-architecture | Model: GPT-5.2

No image available

Lint + CDC + Formal in CI: Practical Pipeline

Design a CI pipeline for hardware: lint rules, CDC checks, reset checks, basic formal proofs, and regression simulation tiers. Include pass/fail gates and artifact retention for debug.

Tags: CI, lint, CDC, formal, regression, EDA

Author: Assistant

Category: fpga-asic-design | Model: gpt-4o

No image available

LLMOps 2026: Evaluation-First Operating System

Create an eval-first LLMOps design: golden sets, adversarial tests, continuous regression, cost/latency tracking, and release gates. Include a ‘model change control’ policy.

Tags: LLMOps, evaluation, guardrails, regression, change-control

Author: Assistant

Category: ai-strategy-2026 | Model: gpt-4o

No image available

Performance Regression Hunter

Define performance budgets and benchmarks; ChatGPT proposes profiling plan; Cursor instruments code (e.g., pprof/Flamegraphs); Antigravity runs load tests, flags regressions, and files targeted PRs. O...

Tags: performance, profiling, benchmarks, Cursor, Antigravity, ChatGPT

Author: Assistant

Category: performance-engineering | Model: gpt-4o

No image available

Attribution-Aware Eval Harness

Build an eval that scores ground-truth attribution (exact passage match), answer faithfulness, and coverage. Provide dataset schema and a nightly regression plan.

Tags: evaluation, attribution, faithfulness, coverage, datasets

Author: Assistant

Category: evaluation-frameworks | Model: gpt-4o

No image available

LLM Prompt Registry & Eval Harness

ChatGPT drafts prompts and adversarial tests; Cursor integrates an eval harness; Antigravity schedules nightly evals and posts regressions with diffs. Include versioning and approval flow.

Tags: LLM, prompts, evaluation, registry, Cursor, Antigravity, ChatGPT

Author: Assistant

Category: mlops-llm-quality | Model: gpt-4o

No image available

Spec → Tests → Code Chain

Given a product brief, have ChatGPT write acceptance criteria and test outlines, ask Cursor to generate unit/integration tests and the minimal code, then let Antigravity agents execute the suite and o...

Tags: TDD, testing, ChatGPT, Cursor, Antigravity, automation

Author: Assistant

Category: quality-engineering | Model: gpt-4o

No image available

Meta-Analysis with Bias Diagnostics

Act as a meta-analyst. Provide a DerSimonian–Laird vs REML plan, heterogeneity measures, subgroup/meta-regression, and publication bias checks (funnel/Egger/trim-and-fill).

Tags: meta-analysis, heterogeneity, bias, REML, PRISMA

Author: Assistant

Category: biostats-methods | Model: gpt-5

No image available

Timing ECO Cookbook (Minimal PPA Hit)

Provide a timing ECO guide: buffer sizing rules, cell Vt swaps, re-route constraints, shielding for aggressors, hold-fix ordering, and checks to avoid IR/EM regression. Include a before/after metrics ...

Tags: IC, timing, ECO, SI, cell-sizing, Vt-swap

Author: Assistant

Category: chip-design | Model: gpt-4

No image available

Automated Signoff and ECO Loop

Define a push-button signoff pipeline: reproducible EDA containers, golden rule decks, regression checklists, run orchestration, artifact retention, and ECO automation for timing/IR/DRC. Output a CI/C...

Tags: IC, signoff, automation, ECO, CI/CD, flows

Author: Assistant

Category: chip-design | Model: gpt-4

No image available

Robotics Dev Co.: Sim-in-the-Loop CI Pipeline

You are a robotics CI lead. Design a sim-in-the-loop pipeline for perception + planning stacks. Deliver: scenario library spec (lighting, occlusion, rare edge cases), metrics (success %, collision=0, ...

Tags: robotics, simulation, CI/CD, testing, telemetry, metrics

Author: Tsubasa Kato

Category: Engineering | Model: GPT-5 Thinking