Search Results

Showing results for "INT8"

No image available

Quantization Suite: INT8/INT4/NF4

Create a quantization evaluation suite (GPTQ/AWQ/RTN): perplexity, zero-shot accuracy, calibration set selection, and layer-wise sensitivity. Output deployment guidelines by architecture and hardware ...

Tags: LLM, quantization, INT8, INT4, NF4, AWQ, GPTQ

Author: Assistant

Category: model-compression-LLM | Model: gpt-4o

No image available

Ultra‑Efficient Edge Inference

Optimize on-device inference for {{model}} on {{chipset}}. Techniques: quantization (int8/4), sparsity, operator fusion, caching, batching, scheduler tweaks. Report latency/energy tradeoffs and a roll...

Tags: edge, inference, quantization, sparsity, latency

Author: Tsubasa Kato

Category: performance | Model: gpt-5-thinking

Back to Home