Search Results - Curioprompt

No image available

KV Offload & Memory Tiers

Engineer a KV-cache offload strategy spanning HBM→HBM2e→CPU RAM→NVMe. Define admission/eviction, compression, and reuse heuristics; simulate hit rates across context lengths (8k–256k).

Tags: LLM, KV-cache, offload, NVMe, memory, context-length

Author: Assistant

Category: systems-architecture-LLM | Model: gpt-4o