An LLM-Powered Agent for Automated Single-Line Bug Repair in Python Programs from the QuixBugs Benchmark
Automated Code Correction Task
Python-Error-Fixer is an intelligent debugging system that leverages the power of large language models to automatically identify and repair single-line bugs in Python programs. It focuses on defective implementations drawn from the QuixBugs Benchmark, a collection of classic algorithmic problems with known bugs.
This project demonstrates a fully functional prototype, featuring a Gradio-based interface, bug classification heuristics, automated test validation, and complete program evaluation with detailed logging.
- โ
Model-Driven Repair: Uses FLAN-T5 (
google/flan-t5-small) to fix code. - ๐งช Automated Test Execution: Uses test functions to validate each fix.
- ๐ท๏ธ Bug Classification: Heuristic bug class mapping from error patterns.
- ๐ Gradio UI: Interactive interface for user-driven or file-based debugging.
- ๐ Evaluation Pipeline: Batch processes all programs and logs results.
. โโโ app.py # Core logic: bug fixing, classification, test running โโโ evaluate.py # Batch evaluation of all buggy programs โโโ results.txt # Output log of evaluation with success/fail reports โโโ Code-Refactoring-QuixBugs/ โ โโโ python_programs/ # Buggy Python implementations from QuixBugs โโโ requirements.txt # Required dependencies โโโ README.md # This file
๐งช Batch Evaluation
๐ Bug Classes (Heuristic Mapping) The system uses string heuristics to classify bugs into 14 common types:
Off-by-one
Incorrect condition
Wrong operator
Wrong return
Wrong assignment
Missing initialization
Extra statement
Infinite loop
Wrong function call
Missing edge case
Logic bug
Data structure misuse
API misuse
Syntax error
These labels help in tracking and potentially adapting future repair strategies.
๐ฌ Model & Prompt Model Used: google/flan-t5-small Prompt Format:
css
Copy
Edit
You are an expert Python developer. The following Python code has a single-line bug:
Please identify and fix the bug, preserving the original algorithm and style. Return ONLY the fixed code inside apython ... code block.
๐ฎ Future Work
Improve Prompt Engineering: Restrict output to valid Python only.
Handle Multi-Line Bugs: Extend beyond single-line fixes.
Cross-Language Support: Add Java support from QuixBugs dataset.
Custom Metrics: Better evaluate repair quality beyond pass/fail.
๐ License
MIT License. See LICENSE for details.
๐ Acknowledgments
Dataset: Code-Refactoring-QuixBugs
Model: FLAN-T5
Interface: Gradio