Official implementation of
by Arnav Kumar Jain*, Gonzalo-Gonzalez Pumariega*, Wayne Chen, Alexander Rush, Wenting Zhao † and Sanjiban Choudhury †
Create a virtual environment and activate it
# Conda
conda create -n mucode python=3.10
conda activate mucode
# PyEnv
pyenv virtualenv 3.10 mucode
pyenv activate mucodeFirst, clone Open-Instruct, place the directory under /src/ in this repository, and follow their installation guide.
Then, install this repository's required packages using
pip install -r requirements.txt --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ --extra-index-url https://download.pytorch.org/whl/cu121
pip install flash-attn==2.6.3For training $\mu$Code on Llama-3.1-8B-Instruct as the base model using 4 GPUs, run
bash bash/mucode.sh Llama-3.1-8B-Instruct ./output/
To evaluate the performance of trained model at pass@k (greedy decoding with temperature = 0.0), run
bash bash/pass_at_k.sh $GENERATOR_CHECKPOINT_PATH ./output/
The $GENERATOR_CHECKPOINT_PATH is either (1) the HuggingFace repository path of the trained model or (2) the local path to the checkpoint file. For instance, if you trained the model using the bash/mucode.sh command above, the final checkpoint path would be ./output/mucode/SFT/mbpp_train_iter2/
We provide the scripts for multi-turn best of N (BoN) search with verifiers at inference time. The following command that will generate $N responses (with temperature $TEMPERATURE) and filter with $VERIFIER_SETTING at each turn
bash bash/best_of_n.sh $GENERATOR_CHECKPOINT_PATH $VERIFIER_CHECKPOINT_PATH $EXPERIMENT_NAME $VERIFIER_SETTING $N $TEMPERATURE $RESULTS_DIR
For instance, to obtain results with public tests and learned verifier (pt+lv), 5 solutions at each turn generated with temperature 0.7, run
bash bash/best_of_n.sh $GENERATOR_CHECKPOINT_PATH $VERIFIER_CHECKPOINT_PATH bon_exp pt+lv 5 0.7 ./output/
For verifiers, use pt for using Public Tests, lv for learned verifier and rand for random selection.
You should be all set to run
If you build on our work or find it useful, please cite it using the following bibtex.
@inproceedings{
jain2025multiturn,
title={Multi-Turn Code Generation Through Single-Step Rewards},
author={Arnav Kumar Jain and Gonzalo Gonzalez-Pumariega and Wayne Chen and Alexander M Rush and Wenting Zhao and Sanjiban Choudhury},
booktitle={Forty-second International Conference on Machine Learning},
year={2025},
url={https://openreview.net/forum?id=aJeLhLcsh0}
}