EfficientQAT

Official PyTorch implementation of EfficientQAT: Efficient Quantization-Aware Training for Large Language Models

News

[2024/08] Added support for Mistral-Large-Instruct quantization. W2g64 Mistral-Large-Instruct compressed to 35 GB with only 4% accuracy loss.
[2024/07] New feature: Transfer EfficientQAT quantized models to GPTQ v2 and BitBLAS formats, loadable through GPTQModel.
[2024/07] Initial release of EfficientQAT, pushing the limits of uniform (INT) quantization efficiently.

Installation

Clone the repository:

git clone https://github.com/OpenGVLab/EfficientQAT.git
cd EfficientQAT

Set up the environment:

conda create -n efficientqat python=3.11
conda activate efficientqat
pip install -r requirements.txt

Model Zoo

We provide pre-quantized EfficientQAT models. For details, see the full model table in the original README.

Training

EfficientQAT involves two training phases: Block-wise training (Block-AP) and end-to-end quantization parameter training (E2E-QP).

Block-AP

Modify the --model path in the script, then run:

examples/block_ap/LlamaForCasualLM/w2g64.bat

E2E-QP

Modify the --quant_model_path in the script, then run:

For RedPajama dataset:

examples/e2e_qp/Llama-2-7b/w2g64-redpajama.bat

For Alpaca dataset:

examples/e2e_qp/Llama-2-7b/w2g64-alpaca.bat

Inference

Download pre-quantized models:

pip install huggingface_hub
huggingface-cli download ChenMnZ/Llama-2-7b-EfficientQAT-w2g64 --local-dir ./output/pre_quantized_models/Llama-2-7b-EfficientQAT-w2g64

Evaluate:

@echo off
set CUDA_VISIBLE_DEVICES=0
python main_block_ap.py ^
--resume_quant ./output/pre_quantized_models/Llama-2-7b-EfficientQAT-w2g64 ^
--net Llama-2 ^
--wbits 2 ^
--group_size 64 ^
--output_dir ./output/inference_results/ ^
--eval_ppl ^
--eval_tasks piqa,arc_easy,arc_challenge,hellaswag,winogrande

Model Transferring

Install gptqmodel:

git clone https://github.com/ModelCloud/GPTQModel.git && cd GPTQModel
bash install.sh

Transfer options:

To GPTQ format:

examples/model_transfer/efficientqat_to_gptq/LlamaForCasualLM.bat

To BitBLAS format:

examples/model_transfer/efficientqat_to_bitblas/LlamaForCasualLM.bat

Convert fp32 to half-precision:

examples/model_transfer/fp32_to_16/LlamaForCasualLM.bat

Inference with Other Formats

Example for GPTQ or BitBLAS formats:

from transformers import AutoTokenizer
from gptqmodel import GPTQModel

quant_dir = "ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-GPTQ"
# or "ChenMnZ/Llama-2-7b-EfficientQAT-w2g128-BitBLAS"

tokenizer = AutoTokenizer.from_pretrained(quant_dir, use_fast=True)
model = GPTQModel.from_quantized(quant_dir)

print(tokenizer.decode(model.generate(**tokenizer("Model quantization is", return_tensors="pt").to(model.device))[0]))

Citation

If you find this work useful, please cite:

@article{efficientqat,
  title={EfficientQAT: Efficient Quantization-Aware Training for Large Language Models},
  author={Chen, Mengzhao and Shao, Wenqi and Xu, Peng and Wang, Jiahao and Gao, Peng and Zhang, Kaipeng and Qiao, Yu and Luo, Ping},
  journal={arXiv preprint arXiv:2407.11062},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
deita_dataset		deita_dataset
examples		examples
model_transfer		model_transfer
quantize		quantize
README.md		README.md
datautils_block.py		datautils_block.py
datautils_e2e.py		datautils_e2e.py
main_block_ap.py		main_block_ap.py
main_e2e_qp.py		main_e2e_qp.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EfficientQAT

News

Contents

Installation

Model Zoo

Training

Block-AP

E2E-QP

Inference

Model Transferring

Inference with Other Formats

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EfficientQAT

News

Contents

Installation

Model Zoo

Training

Block-AP

E2E-QP

Inference

Model Transferring

Inference with Other Formats

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages