When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

This is the artifact for the work "When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair".

I. Environments

Recommended system: Ubuntu 20.04
Transformers version 4.38.2
Federated learning package & others (e.g., Python, Pytorch, Cuda, etc.): Please refer to the installation of FederatedScope

II. Main Project Structure

├── Model                      # directory where the llms and the fine-tuned llms are saved
├── Data                       # directory of the datasets
│   ├── eval_java              # the EvalRepair-Java benchmark
├── FederatedScope             # the federated learning package on which we develop our components
│   ├── federatedscope
│   ├──   ├── llm
│   ├──   ├──   ├── MyScripts  # the fine-tuning scripts
│   ├──   ├──   ├── MyUtils    # some preprocessing and intermediate utils
├── RoPGen                     # coding style extractor

III. Model

The code LLMs used in this study are available via HuggingFace:

IV. Data

TutorCode

The fine-tuning dataset TutorCode restricts its access to prevent data leakage caused by web crawlers, which collect web data as the pretraining corpora for LLMs. Please use the dataset via the official API of TutorCode.

Fine-tuning Dataset

After acquiring the original json files from the TutorCode API, the formatted fine-tuning dataset can be constructed by extract_from_morepair.py:

python extract_from_tutorcode.py <directory of the json files> <path to the fine-tuning dataset>

Evaluation Benchmark

In the directory eval_java is the EvalRepair-Java benchmark, which consists of augmented test cases from HumanEval-Java.

V. Experiments

RQ1. Effectiveness of Federated Fine-tuning

Federated Fine-tuning

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq1/finetune_<model name>.yaml

Centralized Fine-tuning

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq1/finetune_<model name>_global.yaml

Local fine-tuning

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq1/finetune_<model name>_local.yaml

The model name can be referred to as follows:

Model	<model name>
CodeLlama-13B-Instruct	codellama13b
CodeLlama-7B-Instruct	codellama7b
DeepseekCoder-7B-Instruct-V1.5	deepseek7b
WizardCoder-15B-V1.0	wizard15b
Mistral-7B-Instruct-v0.2	mistral7b
CodeQwen1.5-7B-Chat	codeqwen7b

RQ2. Impact of Data Heterogeneity

Construction of Heterogeneous Code

1. Heterogeneous Coding Style

# preprocessing
# extract cpp files from original crawled TutorCode json files and save it to the specified directory
python FederatedScope/federatedscope/llm/MyUtils/extract_cpp.py <directory of the original TutorCode json files> RoPGen/src/coding style attacks/author-style-transform/transform/program_file/target_author_file

# put each cpp file in a directory to satisfy the format of RoPGen input
python FederatedScope/federatedscope/llm/MyUtils/put_in_dir.py RoPGen/src/coding style attacks/author-style-transform/transform/program_file/target_author_file

# extract coding style
# the extracted coding styles will be saved in 'RoPGen/src/coding style attacks/author-style-transform/transform/author_style'
python RoPGen/src/coding style attacks/author-style-transform/transform/get_style.py

# encode and cluster coding styles
python FederatedScope/federatedscope/llm/MyUtils/style_clustering.py RoPGen/src/coding style attacks/author-style-transform/transform/author_style <fine-tuning dataset path> <clustered fine-tuning dataset path> <cluster number>

2. Heterogeneous Code Complexity The numbers of modified hunks are used to indicate the code complexity of each bug-fix pair, as can be referred to in data/TutorCode/hunks.json.

3. Heterogeneous Code Embedding We use CodeBERT, which has been pretrained to capture the context information from the NL-PL pairs.

python FederatedScope/federatedscope/llm/MyUtils/extract_embeddings.py

Fine-tuning with Different Heterogeneity

Fine-tuning with the IID distribution:

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq2/<code feature>/finetune_<model name>.yaml

Fine-tuning with the Non-IID distribution:

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq2/<code feature>/finetune_<model name>_<non-iid degree>.yaml

The specific code features and Non-IID degrees can be referred to as follows:

Code Feature	<code feature>
Coding Style	cs
Code Complexity	cc
Code Embedding	ce

Non-IID Degree	<non-iid degree>
Mild Non-IID	mild
Medium Non-IID	medium
Extreme Non-IID	ex

RQ3. Impact of Federated Algorithms

python FederatedScope/federatedscope/main.py --cfg federatedscope/llm/MyScripts/<model name>/rq3/finetune_<model name>_<algorithm>.yaml

The specific algorithms can be referred to as follows:

Federated Algorithm	<algorithm>
FedAvg	fedavg
FedOPT	fedopt
FedProx	fedprox
FedSWA	fedswa
pFedMe	per

VI. Inference & Evaluation

After fine-tuning the LLMs, merge the fine-tuned adapter with the base model:

python FederatedScope/federatedscope/llm/MyUtils/merge_model.py <directory of the original model> <directory of the fine-tuned adapter> <directory of the merged model>

Inference

python FederatedScope/federatedscope/llm/eval/eval_for_code/inference_java.py <model name> <base model> <model name> <device> <directory of fixes>

Evaluation

python FederatedScope/federatedscope/llm/eval/eval_for_code/calc_java.py <model name> <directory of fixes>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

I. Environments

II. Main Project Structure

III. Model

IV. Data

TutorCode

Fine-tuning Dataset

Evaluation Benchmark

V. Experiments

RQ1. Effectiveness of Federated Fine-tuning

Federated Fine-tuning

Centralized Fine-tuning

Local fine-tuning

RQ2. Impact of Data Heterogeneity

Construction of Heterogeneous Code

Fine-tuning with Different Heterogeneity

RQ3. Impact of Federated Algorithms

VI. Inference & Evaluation

Inference

Evaluation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
FederatedScope		FederatedScope
RoPGen		RoPGen
data		data
model		model
README.md		README.md

stringing/Federated-LLM-Based-APR

Folders and files

Latest commit

History

Repository files navigation

When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair

I. Environments

II. Main Project Structure

III. Model

IV. Data

TutorCode

Fine-tuning Dataset

Evaluation Benchmark

V. Experiments

RQ1. Effectiveness of Federated Fine-tuning

Federated Fine-tuning

Centralized Fine-tuning

Local fine-tuning

RQ2. Impact of Data Heterogeneity

Construction of Heterogeneous Code

Fine-tuning with Different Heterogeneity

RQ3. Impact of Federated Algorithms

VI. Inference & Evaluation

Inference

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages