Search Results
Showing results for "SLO"
No image available
SLO-Driven Improvement: Optimize What Users Feel
Design an improvement loop keyed to SLOs: latency, error rates, and quality metrics. Require that proposed changes specify which SLO they target and how measured.
Tags:
SLO,
metrics,
latency,
errors,
quality,
ops
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Canary Deploy Agent: Progressive Delivery Playbook
Design a progressive delivery system: canary cohorts, SLO monitoring, automatic rollback, and incident annotations. Include safe defaults and stop conditions.
Tags:
canary,
progressive-delivery,
SLO,
rollback,
ops
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Self-Improving Code Health Dashboard
Design a dashboard that drives the agent’s priorities: complexity, test coverage, error hotspots, dependency risk, and SLOs. Include alert thresholds and weekly reports.
Tags:
dashboard,
code-health,
prioritization,
metrics,
SLO
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Self-Improving Service Catalog: Ownership and Dependencies
Design a service catalog that the agent maintains: owners, dependencies, SLOs, runbooks, and deploy pipelines. Use it to route reviews and risk analysis.
Tags:
service-catalog,
ownership,
dependencies,
SLO,
runbooks
Author: Assistant
Category: safe-self-improving-ai | Model: gpt-5.2
No image available
Agent Reliability Scorecard (SLIs/SLOs)
Define SLIs/SLOs for agents: task success, tool failure rates, safety violations, latency, and cost. Provide a dashboard layout and alert thresholds.
Tags:
reliability,
SLI,
SLO,
monitoring,
cost
Author: Assistant
Category: agent-architecture | Model: GPT-5.2
No image available
Agent Runbooks: On-Call Playbook for Failures
Create operational runbooks: common failures, triage steps, rollback, and user comms. Include SLO breaches, tool outages, and prompt regressions.
Tags:
runbooks,
ops,
on-call,
incident-response,
reliability
Author: Assistant
Category: agent-architecture | Model: GPT-5.2
No image available
LLM Inference Playbook (≥90% Targeted Engagement)
As a principal ML engineer, draft a production inference playbook for 7B–70B models: batching, dynamic padding, KV-cache reuse, paged attention, prefix-caching, and request shaping. Include SLO tiers,...
Tags:
LLM,
inference,
batching,
KV-cache,
paged-attention,
SLO,
engagement-90
Author: Assistant
Category: inference-optimization | Model: gpt-4o
No image available
Latency Decomposition & SLOs
Produce a latency decomposition (queue→prefill→decode→post). Propose tail-p95/p99 fixes: micro-batching, admission control, and early-termination heuristics.
Tags:
LLM,
latency,
SLO,
micro-batch,
admission-control
Author: Assistant
Category: perf-engineering-LLM | Model: gpt-4o
No image available
Observability Golden Paths
ChatGPT writes golden path templates for logs/metrics/traces; Cursor inserts instrumented examples; Antigravity validates dashboards and SLO alerts per service. Output service health runbook.
Tags:
observability,
logging,
tracing,
SLO,
Cursor,
Antigravity,
ChatGPT
Author: Assistant
Category: sre-foundations | Model: gpt-4o
No image available
SaaS SLO & Incident Playbook
Create SLO targets per region, on-call rotations spanning TW/US, incident severity ladder, customer comms templates, and postmortem format.
Tags:
software,
SRE,
incidents,
SLA,
SLO,
on-call
Author: Assistant
Category: reliability-ops | Model: gpt-4o
No image available
Observability & SLOs per Region
Design metrics/logs/traces with SLOs split by user region (TW/US). Include error budget policy, synthetic checks, and status page messaging templates.
Tags:
software,
observability,
SLO,
SRE,
monitoring,
regions
Author: Assistant
Category: reliability-engineering | Model: gpt-4o
No image available
Platform Reliability Roadmap
Act as a head of engineering. Create a reliability roadmap: target error budgets, dependency upgrades, chaos drills, capacity plans, and quarterly goals linked to SLO improvements. Provide a dashboard...
Tags:
reliability,
roadmap,
SLO,
capacity,
chaos
Author: tsubasa
Category: engineering | Model: gpt-4o
No image available
Observability Minimum Viable Platform
As a platform engineer, design an observability MVP: log, metric, trace standards; correlation IDs; dashboards for latency, errors, saturation; SLOs and burn-rate alerts; incident response runbook; an...
Tags:
observability,
SRE,
SLO,
alerts,
runbooks
Author: tsubasa
Category: engineering | Model: gpt-4o
No image available
Energy‑Aware Storage & Data Tiering
Propose a data tiering strategy: hot/warm/cold/archive for {{data_types}}.
Policies: TTLs, compaction, dedupe, compression, and green-region replication.
Define SLOs and a deletion automation spec.
Tags:
storage,
tiering,
data-lifecycle,
energy,
SLO
Author: Tsubasa Kato
Category: data-architecture | Model: gpt-5-thinking
No image available
Capacity and Cloud Cost Planner
For workload <service>, model 12-month capacity and cloud cost. Include CPU/GPU, storage, egress, and LLM inference. Compare reserved vs spot vs savings plans. Map to SLOs and traffic seasonality. Out...
Tags:
"CTO;FinOps;capacity;SLO;cost"
Author: ChatGPT
Category: CTO | Model: GPT-5 Thinking
No image available
Incident Postmortem Generator
Create a blameless postmortem for incident <id>: timeline, customer impact, 5 Whys, contributing factors, detection gaps, and corrective actions. Propose guardrails, SLO/SLA adjustments, runbooks, and...
Tags:
"CTO;SRE;incident;postmortem;SLA"
Author: ChatGPT
Category: CTO | Model: GPT-5 Thinking
No image available
Datacenter: Zero-Downtime Ops & Triage Planner
Act as a datacenter reliability lead. Deliver a 4-week plan to cut incidents and MTTR: (1) Map assets (racks, PDUs, BMC/IPMI, switches) and create a golden-rack baseline (airflow, temp, load). (2) Bui...
Tags:
datacenter,
server,
DCIM,
BMC,
MTTR,
SLO,
runbook
Author: Tsubasa Kato
Category: Operations | Model: GPT-5 Thinking
Back to Home