Authors:
Ziyou Jiang、Lin Shi、Guowei Yang、Qing Wang
Paper:
https://arxiv.org/abs/2408.08619
Introduction
Security patches are crucial for maintaining the stability and robustness of projects in the Open-Source Software (OSS) community. Despite the importance of patching vulnerabilities before they are disclosed, many organizations struggle with this task. Security practitioners typically track vulnerable issue reports (IRs) and analyze the relevant insecure code to generate potential patches. However, the insecure code may not always be explicitly specified, making it difficult to generate patches. PatUntrack is an automated approach designed to generate patch examples from IRs without tracked insecure code, utilizing auto-prompting to optimize Large Language Models (LLMs) for this purpose.
Preliminaries
Preliminary Study of Vulnerability Patching
A preliminary study was conducted to analyze the time cost of raising patches for vulnerable IRs and the exploited ratio of IRs with or without tracked insecure commits. The study revealed that nearly 20% of vulnerable IRs require over 150 days to raise patches, and 38% require over three commits to fix the vulnerabilities. Additionally, 69.0% of vulnerable IRs cannot track insecure commits, and 71.7% of these vulnerabilities are successfully exploited by attackers, highlighting the importance of timely and appropriate patches.
Motivating Example
An example of a vulnerable IR is provided to illustrate the motivation behind PatUntrack. The example demonstrates how a vulnerability is triggered based on the textual description of the IR. The vulnerability involves an OS command injection (CWE-78) and can be patched by validating the input with Regex Testing.
Approach
The overall framework of PatUntrack consists of three main steps: generating the complete Vulnerability Triggering Path (VTP) description, correcting potential hallucinations in the VTP description with external golden knowledge, and generating Top-K pairs of insecure code and patch examples.
Generating Complete VTP Description
The VTP description captures how the vulnerability is triggered. It involves extracting the original VTP description from the IR’s textual descriptions and completing the missing nodes and edges to update the vulnerable IRs. The VTP description includes various types of operations such as Src-Load, Func-Call, VulData-Transmit, SecData-Transmit, and Vul-Trigger.
Correcting Hallucinatory VTP Description
To address hallucinations in the VTP description, PatUntrack uses VulCoK, which utilizes external golden knowledge for hallucination correction. The process involves detecting hallucinations in the VTP description and correcting them based on the retrieved golden knowledge.
Generating Insecure Code & Patch Example
PatUntrack predicts the patch types based on the corrected VTP description and jointly generates the insecure coding and patch examples. This joint generation approach improves the accuracy of patch generation by reducing biases.
Experimental Design
Dataset Preparation
The dataset for evaluating PatUntrack was collected from three major sources: GHArchive, D2A, and PatchDB. The dataset was denoised and preprocessed to improve its quality. The final dataset consisted of 5,465 vulnerable IRs, with 1,992 used for auto-prompting and 3,473 for evaluation.
Experimental Baselines
Various baselines were used for comparison, including non-LLM baselines for code generation and APR, as well as generative LLMs such as CodeT5, Codex, and ChatGPT.
Metrics and Experimental Settings
The performance of PatUntrack was evaluated using metrics such as MatchLine, MatchTrig, MatchFix, AccType, Trig@K, and Fix@K. The experiments were conducted on a PC with Windows 11 OS and an NVIDIA GeForce RTX 2060.
Results
Performances on Insecure Code Generation
PatUntrack significantly outperformed the baselines in generating insecure code examples. The highest performance was achieved by ChatGPT+PatUntrack, with improvements of +37.1% (MatchLine), +25.9% (MatchTrig), and +10.5% (Trig@10) on average.
Performances on Patch Example Generation
PatUntrack also excelled in generating patch examples, with ChatGPT+PatUntrack achieving the highest performance. The improvements were +8.7% (MatchLine), +17.7% (MatchFix), and +14.6% (Fix@10) on average.
Effect of IR’s Detailed Information
PatUntrack demonstrated the ability to handle IRs with varying levels of detailed information. It outperformed the LLM baselines with over +14.1% (Trig@10) and +27.3% (Fix@10) when the IRs lacked detailed information.
Ablation Study
The ablation study showed that removing key components of PatUntrack, such as the VTP extractor, VulCoK, joint patch generator, and auto-prompting, led to significant decreases in performance. This highlights the importance of each component in the overall approach.
Human Evaluation
Human evaluation was conducted to assess the practical usefulness of PatUntrack. Out of 76 newly disclosed vulnerable IRs, 27 pairs generated by ChatGPT+PatUntrack were accepted by the authors, indicating the practical utility of the generated patch examples.
Discussion
Effect of Joint Code Generation
The joint code generation approach used in PatUntrack was shown to improve the accuracy of patch generation by leveraging the generated insecure code examples.
Unsuccessful Cases
Some cases where PatUntrack failed to generate correct patches were due to incomplete VTP descriptions. Enhancing the input sources for VTP description generation could address this issue.
Threats to Validity
The internal, external, and constructive threats to validity were discussed, along with measures taken to mitigate these threats.
Related Works
The related works section discussed various approaches to vulnerability detection and analysis in OSS projects, as well as patch generation methods. PatUntrack was distinguished from these works by its ability to generate patch examples based on IR textual descriptions without tracked insecure code.
Conclusion
PatUntrack is an automated approach for generating patch examples from IRs without tracked insecure code. It optimizes LLMs to analyze vulnerabilities and generate appropriate patch examples. Experimental results demonstrated the effectiveness of PatUntrack, with significant improvements over traditional LLM baselines. Future work will focus on enhancing PatUntrack by introducing additional third-party resources and analyzing its impact on traditional APR tools.