Search Results - Curioprompt

No image available

SLO-Driven Improvement: Optimize What Users Feel

Design an improvement loop keyed to SLOs: latency, error rates, and quality metrics. Require that proposed changes specify which SLO they target and how measured.

Tags: SLO, metrics, latency, errors, quality, ops

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Canary Deploy Agent: Progressive Delivery Playbook

Design a progressive delivery system: canary cohorts, SLO monitoring, automatic rollback, and incident annotations. Include safe defaults and stop conditions.

Tags: canary, progressive-delivery, SLO, rollback, ops

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Self-Improving Code Health Dashboard

Design a dashboard that drives the agent’s priorities: complexity, test coverage, error hotspots, dependency risk, and SLOs. Include alert thresholds and weekly reports.

Tags: dashboard, code-health, prioritization, metrics, SLO

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Self-Improving Service Catalog: Ownership and Dependencies

Design a service catalog that the agent maintains: owners, dependencies, SLOs, runbooks, and deploy pipelines. Use it to route reviews and risk analysis.

Tags: service-catalog, ownership, dependencies, SLO, runbooks

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Agent Reliability Scorecard (SLIs/SLOs)

Define SLIs/SLOs for agents: task success, tool failure rates, safety violations, latency, and cost. Provide a dashboard layout and alert thresholds.

Tags: reliability, SLI, SLO, monitoring, cost

Author: Assistant

Category: agent-architecture | Model: GPT-5.2

No image available

Agent Runbooks: On-Call Playbook for Failures

Create operational runbooks: common failures, triage steps, rollback, and user comms. Include SLO breaches, tool outages, and prompt regressions.

Tags: runbooks, ops, on-call, incident-response, reliability

Author: Assistant

Category: agent-architecture | Model: GPT-5.2

No image available

LLM Inference Playbook (≥90% Targeted Engagement)

As a principal ML engineer, draft a production inference playbook for 7B–70B models: batching, dynamic padding, KV-cache reuse, paged attention, prefix-caching, and request shaping. Include SLO tiers,...

Tags: LLM, inference, batching, KV-cache, paged-attention, SLO, engagement-90

Author: Assistant

Category: inference-optimization | Model: gpt-4o

No image available

Latency Decomposition & SLOs

Produce a latency decomposition (queue→prefill→decode→post). Propose tail-p95/p99 fixes: micro-batching, admission control, and early-termination heuristics.

Tags: LLM, latency, SLO, micro-batch, admission-control

Author: Assistant

Category: perf-engineering-LLM | Model: gpt-4o

No image available

Observability Golden Paths

ChatGPT writes golden path templates for logs/metrics/traces; Cursor inserts instrumented examples; Antigravity validates dashboards and SLO alerts per service. Output service health runbook.

Tags: observability, logging, tracing, SLO, Cursor, Antigravity, ChatGPT

Author: Assistant

Category: sre-foundations | Model: gpt-4o

No image available

SaaS SLO & Incident Playbook

Create SLO targets per region, on-call rotations spanning TW/US, incident severity ladder, customer comms templates, and postmortem format.

Tags: software, SRE, incidents, SLA, SLO, on-call

Author: Assistant

Category: reliability-ops | Model: gpt-4o

No image available

Observability & SLOs per Region

Design metrics/logs/traces with SLOs split by user region (TW/US). Include error budget policy, synthetic checks, and status page messaging templates.

Tags: software, observability, SLO, SRE, monitoring, regions

Author: Assistant

Category: reliability-engineering | Model: gpt-4o

No image available

Platform Reliability Roadmap

Act as a head of engineering. Create a reliability roadmap: target error budgets, dependency upgrades, chaos drills, capacity plans, and quarterly goals linked to SLO improvements. Provide a dashboard...

Tags: reliability, roadmap, SLO, capacity, chaos

Author: tsubasa

Category: engineering | Model: gpt-4o

No image available

Observability Minimum Viable Platform

As a platform engineer, design an observability MVP: log, metric, trace standards; correlation IDs; dashboards for latency, errors, saturation; SLOs and burn-rate alerts; incident response runbook; an...

Tags: observability, SRE, SLO, alerts, runbooks

Author: tsubasa

Category: engineering | Model: gpt-4o

No image available

Energy‑Aware Storage & Data Tiering

Propose a data tiering strategy: hot/warm/cold/archive for {{data_types}}. Policies: TTLs, compaction, dedupe, compression, and green-region replication. Define SLOs and a deletion automation spec.

Tags: storage, tiering, data-lifecycle, energy, SLO

Author: Tsubasa Kato

Category: data-architecture | Model: gpt-5-thinking

No image available

Capacity and Cloud Cost Planner

For workload <service>, model 12-month capacity and cloud cost. Include CPU/GPU, storage, egress, and LLM inference. Compare reserved vs spot vs savings plans. Map to SLOs and traffic seasonality. Out...

Tags: "CTO;FinOps;capacity;SLO;cost"

Author: ChatGPT

Category: CTO | Model: GPT-5 Thinking

No image available

Incident Postmortem Generator

Create a blameless postmortem for incident <id>: timeline, customer impact, 5 Whys, contributing factors, detection gaps, and corrective actions. Propose guardrails, SLO/SLA adjustments, runbooks, and...

Tags: "CTO;SRE;incident;postmortem;SLA"

Author: ChatGPT

Category: CTO | Model: GPT-5 Thinking

No image available

Datacenter: Zero-Downtime Ops & Triage Planner

Act as a datacenter reliability lead. Deliver a 4-week plan to cut incidents and MTTR: (1) Map assets (racks, PDUs, BMC/IPMI, switches) and create a golden-rack baseline (airflow, temp, load). (2) Bui...

Tags: datacenter, server, DCIM, BMC, MTTR, SLO, runbook

Author: Tsubasa Kato

Category: Operations | Model: GPT-5 Thinking