Search Results
Showing results for "adversarial"
No image available
Adversarial Robustness: Stress Testing Inputs
Create a stress test plan: malformed inputs, long-context traps, conflicting instructions, and toxic content probes. Provide how to automate and score robustness over time.
Tags:
robustness,
adversarial,
testing,
stress-tests,
quality
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Red Team Program for Recursive Systems
Design a continuous red team program: scenarios, cadence, severity scoring, triage workflow, and how findings feed back into the improvement loop. Include a template for red-team reports.
Tags:
red-teaming,
security,
adversarial-testing,
governance,
safety
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Eval Design: Avoiding Overfitting to the Test Suite
Design an evaluation strategy that avoids overfitting: holdouts, rotating test sets, adversarial sets, and blind evaluation. Include rules for when to refresh benchmarks.
Tags:
evaluation,
overfitting,
benchmarks,
holdout,
testing
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Prompt Injection in Retrieved Pages: Sanitization Plan
Design a sanitization pipeline for retrieved content: strip instructions, isolate quotes, and prevent tool-use hijacks. Include adversarial test cases and regression suite.
Tags:
prompt-injection,
sanitization,
security,
RAG,
testing
Author: Assistant
Category: research-bot | Model: GPT-5.2
No image available
Prompt Injection Defense Plan (Tool-Using Agents)
Design defenses against prompt injection for tool-using agents: content provenance, allowlists, tool policy, and sandboxing. Include a suite of adversarial prompts for regression testing.
Tags:
prompt-injection,
agents,
tooling,
security,
testing
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
Safety Benchmarks: Build a Domain-Specific Set
Help me design a domain-specific safety benchmark: representative tasks, policy-sensitive cases, and adversarial cases. Include labeling guidelines and inter-annotator agreement checks.
Tags:
benchmarks,
safety,
domain-specific,
annotation,
quality
Author: Assistant
Category: recursive-ai-safety | Model: GPT-5.2
No image available
LLMOps 2026: Evaluation-First Operating System
Create an eval-first LLMOps design: golden sets, adversarial tests, continuous regression, cost/latency tracking, and release gates. Include a ‘model change control’ policy.
Tags:
LLMOps,
evaluation,
guardrails,
regression,
change-control
Author: Assistant
Category: ai-strategy-2026 | Model: gpt-4o
No image available
Adversarial ML Primer (Postdoc)
Summarize poisoning/evasion threats to IR/LLM systems. Provide a lab with simple attacks and defenses and a measurement plan.
Tags:
adversarial-ml,
IR,
LLM,
security,
postdoc
Author: Assistant
Category: advanced-research-MLSec | Model: gpt-4o
No image available
Safety Red Team & Taxonomy
Create a safety taxonomy (harm classes) and a multilingual red-team plan with auto-generation of adversarial prompts. Provide block/transform policies and human review paths.
Tags:
LLM,
safety,
red-team,
taxonomy,
policy,
multilingual
Author: Assistant
Category: safety-program-LLM | Model: gpt-4o
No image available
Red-Team Your Thesis
Generate adversarial questions that would falsify your stock thesis. Provide data sources to check and a pre-commit exit criterion list.
Tags:
investing,
debiasing,
red-team,
thesis,
exits
Author: Assistant
Category: investing-discipline | Model: gpt-4o
No image available
LLM Prompt Registry & Eval Harness
ChatGPT drafts prompts and adversarial tests; Cursor integrates an eval harness; Antigravity schedules nightly evals and posts regressions with diffs. Include versioning and approval flow.
Tags:
LLM,
prompts,
evaluation,
registry,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: mlops-llm-quality | Model: gpt-4o
Back to Home