Search Results - Curioprompt

No image available

Safe Refactor of Logging: Preserve Signal, Reduce Noise

Create a plan for improving logs: structured fields, sampling, PII redaction, and correlation IDs. Require that changes do not reduce incident investigability.

Tags: logging, observability, PII-redaction, correlation-ids, SRE

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Error Budget Governance for Automated Changes

Define error budgets and a policy: when error budget is low, block auto-deploys and require human approval. Include dashboards and alert thresholds.

Tags: error-budget, SRE, governance, alerts, deploy-control

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Self-Improving Logging Cost: Sampling and Cardinality Control

Create a plan to reduce logging costs: sampling, cardinality caps, and aggregation. Require proof that debugging capability is preserved via incident drills.

Tags: logging, cost-control, sampling, cardinality, SRE

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Self-Improving On-Call Runbooks

Design an agent that updates runbooks based on incidents and recurring questions. Include change review, versioning, and a “proof of usefulness” metric.

Tags: runbooks, SRE, docs, continuous-improvement, versioning

Author: Assistant

Category: safe-self-improving-ai | Model: gpt-5.2

No image available

Observability Golden Paths

ChatGPT writes golden path templates for logs/metrics/traces; Cursor inserts instrumented examples; Antigravity validates dashboards and SLO alerts per service. Output service health runbook.

Tags: observability, logging, tracing, SLO, Cursor, Antigravity, ChatGPT

Author: Assistant

Category: sre-foundations | Model: gpt-4o

No image available

Zero-Downtime Deploy Kit

ChatGPT outlines blue/green and canary strategies; Cursor codifies health checks and probes; Antigravity automates traffic shifting and alerting. Provide a failure playbook.

Tags: deployments, blue-green, canary, SRE, Cursor, Antigravity, ChatGPT

Author: Assistant

Category: availability-engineering | Model: gpt-4o

No image available

Incident Postmortem Synthesizer

Collect logs/incidents; ChatGPT drafts a blameless postmortem; Cursor queries log/trace snippets and links to code; Antigravity reconstructs a timeline and verifies action items land in code/config. P...

Tags: SRE, incident, postmortem, observability, ChatGPT, Cursor, Antigravity

Author: Assistant

Category: reliability-ops | Model: gpt-4o

No image available

Game Day Resilience Program

Schedule cross team game days to test failure modes. Define scenarios, injects, and success criteria. Output a readiness score.

Tags: resilience, chaos-engineering, SRE, operations, managers

Author: Assistant

Category: resilience-ops | Model: gpt-4o

No image available

Incident Learning Loop

Create incident severities, comms templates, and blameless postmortem format. Propose a 30 minute weekly learning review.

Tags: incidents, SRE, postmortem, learning, managers

Author: Assistant

Category: reliability-ops | Model: gpt-4o

No image available

CI CD Guardrail Registry

List guardrails for deploy safety such as canaries, feature flags, error budgets, and rollback scripts. Output a checklist and training plan.

Tags: CI-CD, DevOps, guardrails, SRE, release

Author: Assistant

Category: release-engineering | Model: gpt-4o

No image available

Observability & SLOs per Region

Design metrics/logs/traces with SLOs split by user region (TW/US). Include error budget policy, synthetic checks, and status page messaging templates.

Tags: software, observability, SLO, SRE, monitoring, regions

Author: Assistant

Category: reliability-engineering | Model: gpt-4o

No image available

SaaS SLO & Incident Playbook

Create SLO targets per region, on-call rotations spanning TW/US, incident severity ladder, customer comms templates, and postmortem format.

Tags: software, SRE, incidents, SLA, SLO, on-call

Author: Assistant

Category: reliability-ops | Model: gpt-4o

No image available

SRE Incident Drill Pack

As an SRE lead, prepare an incident drill pack: 3 realistic failure scenarios, runbook steps, on-call rotation, comms templates, status page samples, and a postmortem format with action owners and dea...

Tags: SRE, incidents, runbooks, on-call, postmortem

Author: tsubasa

Category: engineering | Model: gpt-4o

No image available

Observability Minimum Viable Platform

As a platform engineer, design an observability MVP: log, metric, trace standards; correlation IDs; dashboards for latency, errors, saturation; SLOs and burn-rate alerts; incident response runbook; an...

Tags: observability, SRE, SLO, alerts, runbooks

Author: tsubasa

Category: engineering | Model: gpt-4o

No image available

Climate‑Resilient SRE

Update SRE program for climate risks. Scenarios: heat, floods, wildfires, outages. Plan: region failover, brownout modes, cache-first read, comms templates, drills. Add recovery time targets and user ...

Tags: SRE, resilience, climate-risk, failover, disaster

Author: Tsubasa Kato

Category: reliability | Model: gpt-5-thinking

No image available

Incident Postmortem Generator

Create a blameless postmortem for incident <id>: timeline, customer impact, 5 Whys, contributing factors, detection gaps, and corrective actions. Propose guardrails, SLO/SLA adjustments, runbooks, and...

Tags: "CTO;SRE;incident;postmortem;SLA"

Author: ChatGPT

Category: CTO | Model: GPT-5 Thinking

No image available

Enterprise: Global Rollout Playbook

Produce a multi-region rollout: localization, data residency, tenant isolation, key mgmt, latency budgets, SRE on-call, and regional model routing. Include training paths, comms plan, and a lighthouse...

Tags: enterprise, global, rollout, localization, SRE

Author: Tsubasa Kato

Category: Strategy | Model: GPT-5 Thinking

No image available

Incident Response Brief (SRE)

Produce a crisp post-incident brief for outage {{incident_id}} within last 12h: start/end, user impact, top 3 proximate causes, current status, rollback/mitigation, next steps, ETA to full recovery. L...

Tags: sre;ops;incident;engineering;timely

Author: Tsubasa Kato

Category: Engineering | Model: gpt-5