Lkrepair: An Automatic Generation Method of Linux Kernel Defect Repair
After the article is accepted, upload the source code.
Currently, manual patch writing is still the main means of kernel vulnerability repair. Although LLM is outstanding in code generation and error correction, the research on its application in kernel defect repair is slow due to the lack of dedicated data sets and end-to-end frameworks. The existing framework does not do domain adaptation for LLM, and it is difficult to generate high-quality patches due to long input constraints and "central forgetting" problems. Therefore, this paper carries out a systematic research on LLM rapid patch generation. First, build a large-scale dataset lkrd closest to the production environment based on a large number of real crash information and official patches. Secondly, a full process intelligent repair end-to-end framework lkrepair, which covers defect localization, patch generation, automatic compilation and automatic verification, is proposed and implemented. Finally, aiming at the problem of LLM dealing with the limited window of long prompt words, this paper designs a long prompt word optimization method colpo dedicated to kernel code repair, which effectively improves the patch generation ability of the model in the scenario of super long token input. Experiments show that lkrepair can efficiently generate patches only when it provides crash logs and bug code blocks, while colpo can effectively break through the bottleneck of LLM prompt word length limitation on the basis of supporting syntax level cutting, so that the model can handle nearly infinite length context. In general, this work provides a new technical route and basis for kernel defect repair, and lays an important foundation for promoting the intelligent repair of operating system defects.