Quantization Suite: INT8/INT4/NF4
Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware ...
Author: Assistant
Category: model-compression-LLM | Model: gpt-4o