Search Results
Showing results for "SRE"
No image available
Safe Refactor of Logging: Preserve Signal, Reduce Noise
Create a plan for improving logs: structured fields, sampling, PII redaction, and correlation IDs. Require that changes do not reduce incident investigability.
Tags:
logging,
observability,
PII-redaction,
correlation-ids,
SRE
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Error Budget Governance for Automated Changes
Define error budgets and a policy: when error budget is low, block auto-deploys and require human approval. Include dashboards and alert thresholds.
Tags:
error-budget,
SRE,
governance,
alerts,
deploy-control
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Self-Improving Logging Cost: Sampling and Cardinality Control
Create a plan to reduce logging costs: sampling, cardinality caps, and aggregation. Require proof that debugging capability is preserved via incident drills.
Tags:
logging,
cost-control,
sampling,
cardinality,
SRE
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Self-Improving On-Call Runbooks
Design an agent that updates runbooks based on incidents and recurring questions. Include change review, versioning, and a “proof of usefulness” metric.
Tags:
runbooks,
SRE,
docs,
continuous-improvement,
versioning
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Observability Golden Paths
ChatGPT writes golden path templates for logs/metrics/traces; Cursor inserts instrumented examples; Antigravity validates dashboards and SLO alerts per service. Output service health runbook.
Tags:
observability,
logging,
tracing,
SLO,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: sre-foundations | Model: gpt-4o
No image available
Zero-Downtime Deploy Kit
ChatGPT outlines blue/green and canary strategies; Cursor codifies health checks and probes; Antigravity automates traffic shifting and alerting. Provide a failure playbook.
Tags:
deployments,
blue-green,
canary,
SRE,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: availability-engineering | Model: gpt-4o
No image available
Incident Postmortem Synthesizer
Collect logs/incidents; ChatGPT drafts a blameless postmortem; Cursor queries log/trace snippets and links to code; Antigravity reconstructs a timeline and verifies action items land in code/config. P...
Tags:
SRE,
incident,
postmortem,
observability,
ChatGPT,
Cursor,
Antigravity
Author: Assistant
Category: reliability-ops | Model: gpt-4o
No image available
Game Day Resilience Program
Schedule cross team game days to test failure modes. Define scenarios, injects, and success criteria. Output a readiness score.
Tags:
resilience,
chaos-engineering,
SRE,
operations,
managers
Author: Assistant
Category: resilience-ops | Model: gpt-4o
No image available
Incident Learning Loop
Create incident severities, comms templates, and blameless postmortem format. Propose a 30 minute weekly learning review.
Tags:
incidents,
SRE,
postmortem,
learning,
managers
Author: Assistant
Category: reliability-ops | Model: gpt-4o
No image available
CI CD Guardrail Registry
List guardrails for deploy safety such as canaries, feature flags, error budgets, and rollback scripts. Output a checklist and training plan.
Tags:
CI-CD,
DevOps,
guardrails,
SRE,
release
Author: Assistant
Category: release-engineering | Model: gpt-4o
No image available
Observability & SLOs per Region
Design metrics/logs/traces with SLOs split by user region (TW/US). Include error budget policy, synthetic checks, and status page messaging templates.
Tags:
software,
observability,
SLO,
SRE,
monitoring,
regions
Author: Assistant
Category: reliability-engineering | Model: gpt-4o
No image available
SaaS SLO & Incident Playbook
Create SLO targets per region, on-call rotations spanning TW/US, incident severity ladder, customer comms templates, and postmortem format.
Tags:
software,
SRE,
incidents,
SLA,
SLO,
on-call
Author: Assistant
Category: reliability-ops | Model: gpt-4o
No image available
SRE Incident Drill Pack
As an SRE lead, prepare an incident drill pack: 3 realistic failure scenarios, runbook steps, on-call rotation, comms templates, status page samples, and a postmortem format with action owners and dea...
Tags:
SRE,
incidents,
runbooks,
on-call,
postmortem
Author: tsubasa
Category: engineering | Model: gpt-4o
No image available
Observability Minimum Viable Platform
As a platform engineer, design an observability MVP: log, metric, trace standards; correlation IDs; dashboards for latency, errors, saturation; SLOs and burn-rate alerts; incident response runbook; an...
Tags:
observability,
SRE,
SLO,
alerts,
runbooks
Author: tsubasa
Category: engineering | Model: gpt-4o
No image available
Climate‑Resilient SRE
Update SRE program for climate risks.
Scenarios: heat, floods, wildfires, outages.
Plan: region failover, brownout modes, cache-first read, comms templates, drills.
Add recovery time targets and user ...
Tags:
SRE,
resilience,
climate-risk,
failover,
disaster
Author: Tsubasa Kato
Category: reliability | Model: gpt-5-thinking
No image available
Incident Postmortem Generator
Create a blameless postmortem for incident <id>: timeline, customer impact, 5 Whys, contributing factors, detection gaps, and corrective actions. Propose guardrails, SLO/SLA adjustments, runbooks, and...
Tags:
"CTO;SRE;incident;postmortem;SLA"
Author: ChatGPT
Category: CTO | Model: GPT-5 Thinking
No image available
Enterprise: Global Rollout Playbook
Produce a multi-region rollout: localization, data residency, tenant isolation, key mgmt, latency budgets, SRE on-call, and regional model routing. Include training paths, comms plan, and a lighthouse...
Tags:
enterprise,
global,
rollout,
localization,
SRE
Author: Tsubasa Kato
Category: Strategy | Model: GPT-5 Thinking
No image available
Incident Response Brief (SRE)
Produce a crisp post-incident brief for outage {{incident_id}} within last 12h: start/end, user impact, top 3 proximate causes, current status, rollback/mitigation, next steps, ETA to full recovery. L...
Tags:
sre;ops;incident;engineering;timely
Author: Tsubasa Kato
Category: Engineering | Model: gpt-5
Back to Home