Search Results

Showing results for "quantization"

No image available

Quantization Suite: INT8/INT4/NF4

Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware ...

Tags: LLM, quantization, INT8, INT4, NF4, AWQ, GPTQ

Author: Assistant

Category: model-compression-LLM | Model: gpt-4o

No image available

LoRA/QLoRA Strategy

Recommend when to use LoRA/QLoRA vs full finetune. Define rank search, target layers, and quantization-aware adapters. Include memory/perf tables per GPU class.

Tags: LLM, LoRA, QLoRA, finetuning, adapters, GPU

Author: Assistant

Category: parameter-efficient-tuning-LLM | Model: gpt-4o

Back to Home