Multi-Turn Code Generation Through Single Step Rewards

Official implementation of $\mu$ Code-- a simple and scalable method for multi-turn code generation leveraging one-step recoverability and learned verifiers,

by Arnav Kumar Jain*, Gonzalo-Gonzalez Pumariega*, Wayne Chen, Alexander Rush, Wenting Zhao † and Sanjiban Choudhury †

$\mu$ Code follows an expert iteration framework with a local search expert via a learned verifier. $\mu$ Code iteratively trains two components - 1) a learned verifier to score responses and 2) a generator to produce code solutions by imitating the local search. At test-time, $\mu$ Code searches over successive turns with multi-turn Best-of-N (BoN) search with the learned verifier.

Setup 🛠️

Create a virtual environment and activate it

# Conda
conda create -n mucode python=3.10
conda activate mucode

# PyEnv
pyenv virtualenv 3.10 mucode
pyenv activate mucode

First, clone Open-Instruct, place the directory under /src/ in this repository, and follow their installation guide.

Then, install this repository's required packages using

pip install -r requirements.txt --find-links https://flashinfer.ai/whl/cu121/torch2.4/flashinfer/ --extra-index-url https://download.pytorch.org/whl/cu121
pip install flash-attn==2.6.3

Training 🤖

For training $\mu$Code on Llama-3.1-8B-Instruct as the base model using 4 GPUs, run

bash bash/mucode.sh Llama-3.1-8B-Instruct ./output/

Evaluation 📈

To evaluate the performance of trained model at pass@k (greedy decoding with temperature = 0.0), run

bash bash/pass_at_k.sh $GENERATOR_CHECKPOINT_PATH ./output/

The $GENERATOR_CHECKPOINT_PATH is either (1) the HuggingFace repository path of the trained model or (2) the local path to the checkpoint file. For instance, if you trained the model using the bash/mucode.sh command above, the final checkpoint path would be ./output/mucode/SFT/mbpp_train_iter2/

We provide the scripts for multi-turn best of N (BoN) search with verifiers at inference time. The following command that will generate $N responses (with temperature $TEMPERATURE) and filter with $VERIFIER_SETTING at each turn

bash bash/best_of_n.sh $GENERATOR_CHECKPOINT_PATH $VERIFIER_CHECKPOINT_PATH $EXPERIMENT_NAME $VERIFIER_SETTING $N $TEMPERATURE $RESULTS_DIR

For instance, to obtain results with public tests and learned verifier (pt+lv), 5 solutions at each turn generated with temperature 0.7, run

bash bash/best_of_n.sh $GENERATOR_CHECKPOINT_PATH $VERIFIER_CHECKPOINT_PATH bon_exp pt+lv 5 0.7 ./output/

For verifiers, use pt for using Public Tests, lv for learned verifier and rand for random selection.

You should be all set to run $\mu$ Code now! If you face any issues, feel free to open an issue on the GitHub repository.

Citation 🙌

If you build on our work or find it useful, please cite it using the following bibtex.

@inproceedings{
    jain2025multiturn,
    title={Multi-Turn Code Generation Through Single-Step Rewards},
    author={Arnav Kumar Jain and Gonzalo Gonzalez-Pumariega and Wayne Chen and Alexander M Rush and Wenting Zhao and Sanjiban Choudhury},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025},
    url={https://openreview.net/forum?id=aJeLhLcsh0}
  }

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
bash		bash
configs/ds_configs		configs/ds_configs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Turn Code Generation Through Single Step Rewards

Setup 🛠️

Training 🤖

Evaluation 📈

Citation 🙌

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

portal-cornell/muCode

Folders and files

Latest commit

History

Repository files navigation

Multi-Turn Code Generation Through Single Step Rewards

Setup 🛠️

Training 🤖

Evaluation 📈

Citation 🙌

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages