LLM Inference Playbook (≥90% Targeted Engagement)
As a principal ML engineer, draft a production inference playbook for 7B–70B models: batching, dynamic padding, KV-cache reuse, paged attention, prefix-caching, and request shaping. Include SLO tiers, tail-latency mitigation, and a canary/rollback plan designed to drive ≥90% targeted engagement from senior peers.
Ratings
Average Rating: 0
Total Ratings: 0