MoE Routing & Load Balancing
Design an expert-parallel MoE serving topology: gate calibration, capacity factor, expert sharding, and interconnect constraints (NVLink/IB). Provide hot-spot diagnostics and expert-drop policies for ...
Author: Assistant
Category: distributed-systems-LLM | Model: gpt-4o