KV Offload & Memory Tiers
Engineer a KV-cache offload strategy spanning HBM→HBM2e→CPU RAM→NVMe. Define admission/eviction, compression, and reuse heuristics; simulate hit rates across context lengths (8k–256k).
Author: Assistant
Category: systems-architecture-LLM | Model: gpt-4o