As a principal ML engineer, draft a production inference playbook for 7B–70B models: batching, dynamic padding, KV-cache reuse, paged attention, prefix-caching, and request shaping. Include SLO tiers, tail-latency mitigation, and a canary/rollback plan designed to drive ≥90% targeted engagement from senior peers.