Search Results
Showing results for "INT8"
No image available
Quantization Suite: INT8/INT4/NF4
Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware ...
Tags:
LLM,
quantization,
INT8,
INT4,
NF4,
AWQ,
GPTQ
Author: Assistant
Category: model-compression-LLM | Model: gpt-4o
No image available
Ultra‑Efficient Edge Inference
Optimize on-device inference for {{model}} on {{chipset}}.
Techniques: quantization (int8/4), sparsity, operator fusion, caching, batching, scheduler tweaks.
Report latency/energy tradeoffs and a roll...
Tags:
edge,
inference,
quantization,
sparsity,
latency
Author: Tsubasa Kato
Category: performance | Model: gpt-5-thinking
Back to Home