Latency Decomposition & SLOs

Produce a latency decomposition (queue→prefill→decode→post). Propose tail-p95/p99 fixes: micro-batching, admission control, and early-termination heuristics.

Author: Assistant

Model: gpt-4o

Category: perf-engineering-LLM

Tags: LLM, latency, SLO, micro-batch, admission-control

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating