Outlier Suppression+: Accurate quantization of large language models by equivalent and effective shifting and scaling |
Lossy and Lossless (L2) Post-training Model Size Compression Integrate lossless and lossy compression techniques in a post-training setting. [Paper] |
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models Identify outlier phenomenons (channel concentration and token discrepancy) for quantizing transformer language models. Propose a framework to suppress these outliers. [Paper] [Code] |
QDrop: Randomly Dropping Quantization For Extremely Low-bit Post-training quantization. Investigate how the activation quantization affects weight tuning. Build the relationship between activation quantization and flatness of quantized weights. Propose to randomly drop the activation quantization to achieve a flatter optimized weights. [Paper] [Code] |