AwesomeQuantizationMethod

Quantization 量化是指在推理性能损失最小的情况下,减少模型参数中的比特数 (即精度) 的过程。对模型进行量化的好处有 减小模型体积。 提升推理速度。 在模型推理时,需要不断地从内存中读取模型权重和中间激活值。量化后就能在相同时间内读取更多数据。 处理器执行整数运算的速度远快于浮点运算,如果硬件有专门的低精度计算单元可以显著增加运算速度。 Uniform Quantization Comparison between uniform quantization (left) and non-uniform quantization (right). ...

Oct-13-2025 · 52 min · 25827 words · WITHER

HASTILY

Paper reading of HASTILY.

Oct-07-2025 · 8 min · 3516 words · WITHER

MixQ

Paper reading of MixQ.

Oct-07-2025 · 7 min · 3332 words · WITHER

SpecInfer

Paper reading of SpecInfer.

Oct-06-2025 · 10 min · 4779 words · WITHER

APTMOE

Paper reading of APTMOE.

Oct-06-2025 · 9 min · 4428 words · WITHER

StreamingGS

Paper reading of StreamingGS.

Oct-05-2025 · 9 min · 4282 words · WITHER

HybridMoE

Paper reading of HybridMoE.

Oct-04-2025 · 7 min · 3071 words · WITHER

SpARC

Paper Reading of SpARC.

Oct-03-2025 · 6 min · 2618 words · WITHER

Oltron

Paper Reading of Oltron.

Oct-02-2025 · 8 min · 3894 words · WITHER

SpInfer

Paper reading of SpInfer.

Oct-01-2025 · 8 min · 3847 words · WITHER