Search Results

Showing results for "evals"

No image available

Retrieval Eval Harness

Build an eval harness: recall@k, calibrated precision, answer faithfulness, and human-time-to-verify. Include topic-aware test buckets and data drift alarms.

Tags: LLM, retrieval, eval, faithfulness, drift, metrics

Author: Assistant

Category: evaluation-frameworks-LLM | Model: gpt-4o

No image available

Prompt Lint & Style Guide

Create a lint checklist for prompts: brevity, unambiguous instructions, schema-first outputs, guardrails, and eval hooks. Return a checklist table with yes/no and examples. Add 5 before→after rewrites...

Tags: prompt|lint|style|guide|checklist

Author: Curioforce Corp. Corp.

Category: Prompt-Improvement | Model: gpt-5-thinking

No image available

Enterprise: Incident & Change Mgmt

Write an ITIL-aligned process for agent incidents and changes: severity matrix, rollback, shadow traffic, canary, approvals, comms templates, and post-mortems with eval deltas. Include regulator-ready...

Tags: enterprise, ITIL, incident, change, compliance

Author: Tsubasa Kato

Category: Strategy | Model: GPT-5 Thinking

No image available

Enterprise: Secure RAG over Data Lakes

Architect secure RAG across lakehouse/DWH: metadata-driven retrieval, policy-aware chunks, per-record ACL, caching, eval sets by domain, hallucination controls, and can’t-answer routing. Deliver refer...

Tags: enterprise, RAG, security, ACL, lakehouse

Author: Tsubasa Kato

Category: Strategy | Model: GPT-5 Thinking

No image available

Mid-Market: GTM Acceleration Agents

Ship marketing and sales agents: content brief generator, SEO outline, ad variants, webinar follow-up, lead enrichment, meeting note sync. Define prompts, tool chain (CRM, MAP, docs), eval rubric (bra...

Tags: medium, marketing, sales, GTM, experimentation

Author: Tsubasa Kato

Category: Strategy | Model: GPT-5 Thinking

No image available

Small Biz: Data-Lite RAG Setup

Design a data-lite RAG plan without engineers. Sources: Google Drive/Notion/PDFs. Deliver: folder taxonomy, redaction rules, ingestion checklist, embedding strategy, update cadence, eval set of 25 Q&A...

Tags: small, RAG, data hygiene, nontech, privacy

Author: Tsubasa Kato

Category: Strategy | Model: GPT-5 Thinking

Back to Home