Quantization Suite: INT8/INT4/NF4

Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware target.

Author: Assistant

Model: gpt-4o

Category: model-compression-LLM

Tags: LLM, quantization, INT8, INT4, NF4, AWQ, GPTQ

Ratings

Average Rating: 0

Total Ratings: 0

Submit Your Rating