Skip to content
/ CMoE Public
forked from JarvisPei/CMoE

Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

License

Notifications You must be signed in to change notification settings

zzningxp/CMoE

 
 

Repository files navigation

MoE

Dependencies

conda create -n cmoe python=3.11
conda activate cmoe
conda install pytorch==2.8.0+cu128 torchvision==0.23.0+cu128 torchaudio==2.8.0+cu128 pytorch-cuda=12.8 -c pytorch -c nvidia
pip install datasets==4.4.1
pip install transformers==4.57.5
pip install accelerate==1.12.0
pip install sentencepiece==0.2.0
pip install protobuf==6.33.3
pip install matplotlib==3.10.0
pip install lap==0.5.12
pip install peft==0.14.0

Note: please modify the version of some packages for your own environment.

Supported Models

MoE Models:

OLMoE / Deepseek(v1)-MoE-16B0-base / Deepseek-v2-lite(16B-A3B) / Moonlight(16B-A3B) / Qwen3-30B-A3B

Dense Models:

Llama-2-7B / Llama-2-13B / Llama3-8B / Qwen3-8B

Quick Start

You can run the pre-defined testing script 'run.sh' by:

bash run.sh
python run_cmoe.py $MODEL_PATH wikitext2 \ 
--nshared 2 \
--nactivated 2 \
--nexperts 16 \
--nsamples 2048 \
--extra-lr 0.001 --bias-speed 0.001 --new-eval

Evaluation

bash run.sh

# python
python eval_cmoe.py $MODEL_PATH 

Code sources

Framework code is referenced from: https://github.com/JarvisPei/CMoE

GPTQ code is referenced from: https://github.com/cat538/MxMoE

Important code is inspired from: https://github.com/xuyuzhuang11/CAMERA

Cite

If you found this work useful, please consider citing:


About

Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.1%
  • Shell 0.9%