Skip to content

CodeGeneration2/GBLLM

Repository files navigation

Dual Guidance for LLM-based Code Optimization

🚀 Try Our Online Demo!

Here is a snapshot of the web interface:

f027b2d1dee46134bfbc139f3f15c87a

Boost your code efficiency with GBLLM now!

We have developed an online demo based on the method proposed in our paper "Dual Guidance for LLM-based Code Optimization". You can directly experience the powerful code optimization capabilities of GBLLM without the need for local environment configuration.

👉 Click here to access the Code Efficiency Optimization Tool: http://www.codeoptimization.xyz

  • ✨ Key Features:
    • Auto-Optimization: Simply input your slow code, and the system will automatically generate efficient Fast Code.
    • Multi-language Support: Full support for Python and C++.
    • I/O Validation: Supports custom input/output test cases to ensure the optimized code is not only faster but also correct.

Introduction

over-view

GBLLM is a guidance-based framework for high-level source code optimization that addresses two persistent limitations of existing LLM-driven optimizers: (i) correctness risks caused by shallow, token-level pattern matching, and (ii) efficacy bottlenecks arising from an enormous and sparsely rewarding optimization search space. Rather than treating optimization as a direct “slow tokens → fast tokens” translation, GBLLM reformulates the process into a semantically constrained and performance-directed workflow that jointly enforces functional equivalence and pursues measurable speedups.

At its core, GBLLM operationalizes dual guidance in functionality and efficiency through three synergistic components: Functional Semantic Guidance (natural-language functional summaries and I/O descriptions to anchor semantics), Algorithmic Semantic Retrieval and Efficiency Guidance (via compact algorithm-level representations such as Effi-CFG to provide a reusable optimization roadmap), and a Dynamic Iterative Optimization Mechanism that builds “slow–medium–fast” performance trajectories to set explicit quantitative targets per iteration. This combination transforms open-ended generation into a structured optimization sequence with clear constraints, direction, and endpoints.

Dependency

Python == 3.13.7

C++20

GCC 13.1.0

Linux

Run the following command in the root directory of this repo:

pip install -r requirements.txt

Replication Guide

To reproduce the results presented in the paper, please follow the instructions outlined below. The results for each section are provided at the end of their respective chapters, where you can also find them to skip some of the following steps.

First, install the required dependencies as described above.

This section provides instructions for reproducing the experimental results presented in the paper, with the following research questions (RQ):

  • RQ1: How effective is GBLLM in enhancing code efficiency?

  • RQ2: What are the fine-grained performance characteristics of the code generated by GBLLM at various optimization levels?

  • RQ3: How do individual components of GBLLM contribute to the overall performance?

  • RQ4: Can GBLLM perform effective optimization on code from real-world open-source projects without relying on specific project history data and knowledge bases?

Comparison of GBLLM and Baselines (RQ1)

Baselines: The source code of other baseline methods is in the baselines/ folder. Detailed instructions on how to use them can be found in the README.md file within the baselines/ folder.

GBLLM: Steps to reproduce the results. To reproduce the experimental results, follow these steps:

  1. Data Access: Download the necessary datasets for using GBLLM, specifically the Slow-to-Fast Effi-CFG Knowledge Base and PIE_processed_data.
  2. Running GBLLM and Obtaining Results: Use GBLLM to generate code data on five different LLMs (Large Language Models). The generated results can be found in the Results Generated by GBLLM section.
  3. Generate Code Evaluation Results Using Metrics (OPT and SP): Analyze the code generated by GBLLM to obtain the OPT (Optimization) and SP (Speed) metrics.

1. Data Access

We use the following scripts to process the data, which are described as follows:

  • API__code_sanitization.py: This script is used for dataset and code preprocessing, which is performed at two levels: simple sanitization and complex sanitization. The dataset undergoes complex sanitization, while GBLLM uses simple sanitization.

  • API__Remove_Inline_Breaks.py: This script addresses line-breaking issues caused by the Abstract Syntax Tree (AST) sanitization, which results in excessively long lines of code.

  • API__unify_variable_name_function.py: This script standardizes the variable and function names across the code.

The download links for the datasets are provided below:

  • Download *Datasets*
Language Datasets Slow-to-Fast Effi-CFG Knowledge Base
PIE-C++ C++ PIE_Cpp.csv Cpp__Slow_to_Fast_Effi_CFG_Knowledge_Base.csv
PIE-Python Python PIE_Python.csv Python__Slow_to_Fast_Effi_CFG_Knowledge_Base.csv
PPIE Python PPIE.csv Python__Slow_to_Fast_Effi_CFG_Knowledge_Base.csv

2. Running GBLLM and Obtaining Results

In this section, you will generate results using GBLLM.

Our code relies on the service of OpenAI (for ChatGPT, GPT-4), Google (for Gemini), DeepSeek, and DeepInfra (for CodeLLaMa), so you need first obtain their API keys. After obtaining the API keys, execute the following command to generate code data from GBLLM across five different LLMs.

cd GBLLM_RQ1
bash Generate_Code.sh

Description of the Python scripts used in Generate_Code.sh:

  • API__Single_Generation.py: A wrapper for generating code using LLMs, which includes five different LLMs.

  • Large_model_API_generation.py: A Python script for generating code functionality descriptions and optimized fast code using GBLLM.

Results Generated by GBLLM: The following are the results generated by GBLLM on five different LLMs. These include both the generated descriptions of slow code functionalities and the optimized fast code. For each case, GBLLM performs up to three generation rounds. In each round, the functionality description contains a single entry, and five versions of the fast code are generated.

Language GBLLM Generated Code (Includes CodeLlama-13b-Instruct-hf, CodeLlama-34b-Instruct-hf, Gemini-2.5-flash, GPT-3.5-turbo-0125 and DeepSeek-V3.2-Exp)
PIE-C++ C++ PIE C++ Generated Code
PIE-Python Python PIE Python Generated Code
PPIE Python PPIE Python Generated Code

3. Generate Code Evaluation Results Using Metrics (OPT and SP)

You can use the following script to calculate and report the OPT and SP metrics for the files generated by GBLLM:

cd GBLLM 
bash Statistical_Generation_Code_Data.sh

Description of the Python script used in Statistical_Generation_Code_Data_RQ1.sh:

Statistical_Generation_Code_Data_RQ1.py: A script for calculating and reporting the OPT and SP metrics for the files generated by GBLLM.

Fine-Grained Analysis of GBLLM (RQ2)

Using the code data generated in RQ1, you can use the following script to generate the bar charts for RQ2, as presented in the paper:

cd GBLLM_RQ2_Drafting/Histogram
bash Drawing_Bar_Charts_Versus_Humans.sh

To generate the Venn plot for RQ2 in the paper, use the following script:

cd GBLLM_RQ2_Drafting/Venn
bash Venn.sh

Ablation Study of GBLLM (RQ3)

Please use the following script to generate the code for the ablation study:

cd GBLLM_RQ3_Ablation 
bash GBLLM_Ablation_Remove_NL.sh
bash GBLLM_Ablation_Remove_IO.sh
bash GBLLM_Ablation_Remove_CFG.sh
bash GBLLM_Ablation_Replace_CFG.sh
bash GBLLM_Ablation_Remove_Time.sh
bash GBLLM_Ablation_Remove_Trajectory.sh
bash GBLLM_Ablation_Remove_All.sh

You can use the following script to calculate and report the OPT and SP metrics for the data generated by GBLLM in the ablation study:

cd GBLLM_RQ3_Ablation 
bash Statistical_Generation_Code_Data_RQ3.sh

Ablation Results of GBLLM: The following are the ablation results of GBLLM, which include both the generated descriptions of slow code functionalities and the optimized fast code.

Ablation Results: GBLLM__Ablation.csv

Generalization to Real-World Projects (RQ4)

To address RQ4, we evaluate whether GBLLM can generalize to real-world open-source projects without relying on project-specific history or domain-specific knowledge bases. This section details the datasets used, the generation process, and the evaluation results.

1. Data Access

For this experiment, we curated a dataset comprising code snippets from diverse real-world open-source repositories. These samples are distinct from the PIE/PPIE datasets used in previous RQs to strictly test generalization.

The download links for the real-world datasets are provided below:

Download Real-World Datasets

2. Running GBLLM and Obtaining Results

In this section, you will apply GBLLM to the real-world dataset. As with RQ1, ensure you have configured your API keys for the respective LLMs (OpenAI, Google, DeepSeek, DeepInfra) before proceeding.

Execute the following command to generate optimized code for the real-world:

cd GBLLM_RQ4
bash Generate_Code_RQ4.sh

Results Generated by GBLLM: The results include the functional analysis and the optimized code candidates generated across different LLMs. LanguageGBLLM Generated Code (Real-World Generalization)Real-World Generated Code

3. Generate Code Evaluation Results (RQ4)

To quantify the performance improvements on these real-world projects, use the following script to calculate the OPT (Optimization) and SP (Speedup) metrics:

cd GBLLM_RQ4
bash Statistical_Generation_Code_Data_RQ4.sh

Note on Analysis: This step compares the execution time and memory usage of the GBLLM-optimized code against the original open-source implementations to verify if significant efficiency gains were achieved in a "zero-shot" optimization setting.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published