PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code

Authors:

Ziyou Jiang、Lin Shi、Guowei Yang、Qing Wang

Paper:

Introduction

Security patches are crucial for maintaining the stability and robustness of projects in the Open-Source Software (OSS) community. Despite the importance of patching vulnerabilities before they are disclosed, many organizations struggle with this task. Security practitioners typically track vulnerable issue reports (IRs) and analyze the relevant insecure code to generate potential patches. However, the insecure code may not always be explicitly specified, making it difficult to generate patches. PatUntrack is an automated approach designed to generate patch examples from IRs without tracked insecure code, utilizing auto-prompting to optimize Large Language Models (LLMs) for this purpose.

Preliminaries

Preliminary Study of Vulnerability Patching

A preliminary study was conducted to analyze the time cost of raising patches for vulnerable IRs and the exploited ratio of IRs with or without tracked insecure commits. The study revealed that nearly 20% of vulnerable IRs require over 150 days to raise patches, and 38% require over three commits to fix the vulnerabilities. Additionally, 69.0% of vulnerable IRs cannot track insecure commits, and 71.7% of these vulnerabilities are successfully exploited by attackers, highlighting the importance of timely and appropriate patches.

Motivating Example

An example of a vulnerable IR is provided to illustrate the motivation behind PatUntrack. The example demonstrates how a vulnerability is triggered based on the textual description of the IR. The vulnerability involves an OS command injection (CWE-78) and can be patched by validating the input with Regex Testing.

Approach

The overall framework of PatUntrack consists of three main steps: generating the complete Vulnerability Triggering Path (VTP) description, correcting potential hallucinations in the VTP description with external golden knowledge, and generating Top-K pairs of insecure code and patch examples.

Generating Complete VTP Description

The VTP description captures how the vulnerability is triggered. It involves extracting the original VTP description from the IR’s textual descriptions and completing the missing nodes and edges to update the vulnerable IRs. The VTP description includes various types of operations such as Src-Load, Func-Call, VulData-Transmit, SecData-Transmit, and Vul-Trigger.

Correcting Hallucinatory VTP Description

To address hallucinations in the VTP description, PatUntrack uses VulCoK, which utilizes external golden knowledge for hallucination correction. The process involves detecting hallucinations in the VTP description and correcting them based on the retrieved golden knowledge.

Generating Insecure Code & Patch Example

PatUntrack predicts the patch types based on the corrected VTP description and jointly generates the insecure coding and patch examples. This joint generation approach improves the accuracy of patch generation by reducing biases.

Experimental Design

Dataset Preparation

The dataset for evaluating PatUntrack was collected from three major sources: GHArchive, D2A, and PatchDB. The dataset was denoised and preprocessed to improve its quality. The final dataset consisted of 5,465 vulnerable IRs, with 1,992 used for auto-prompting and 3,473 for evaluation.

Experimental Baselines

Various baselines were used for comparison, including non-LLM baselines for code generation and APR, as well as generative LLMs such as CodeT5, Codex, and ChatGPT.

Metrics and Experimental Settings

The performance of PatUntrack was evaluated using metrics such as MatchLine, MatchTrig, MatchFix, AccType, Trig@K, and Fix@K. The experiments were conducted on a PC with Windows 11 OS and an NVIDIA GeForce RTX 2060.

Results

Performances on Insecure Code Generation

PatUntrack significantly outperformed the baselines in generating insecure code examples. The highest performance was achieved by ChatGPT+PatUntrack, with improvements of +37.1% (MatchLine), +25.9% (MatchTrig), and +10.5% (Trig@10) on average.

Performances on Patch Example Generation

PatUntrack also excelled in generating patch examples, with ChatGPT+PatUntrack achieving the highest performance. The improvements were +8.7% (MatchLine), +17.7% (MatchFix), and +14.6% (Fix@10) on average.

Effect of IR’s Detailed Information

PatUntrack demonstrated the ability to handle IRs with varying levels of detailed information. It outperformed the LLM baselines with over +14.1% (Trig@10) and +27.3% (Fix@10) when the IRs lacked detailed information.

Ablation Study

The ablation study showed that removing key components of PatUntrack, such as the VTP extractor, VulCoK, joint patch generator, and auto-prompting, led to significant decreases in performance. This highlights the importance of each component in the overall approach.

Human Evaluation

Human evaluation was conducted to assess the practical usefulness of PatUntrack. Out of 76 newly disclosed vulnerable IRs, 27 pairs generated by ChatGPT+PatUntrack were accepted by the authors, indicating the practical utility of the generated patch examples.

Discussion

Effect of Joint Code Generation

The joint code generation approach used in PatUntrack was shown to improve the accuracy of patch generation by leveraging the generated insecure code examples.

Unsuccessful Cases

Some cases where PatUntrack failed to generate correct patches were due to incomplete VTP descriptions. Enhancing the input sources for VTP description generation could address this issue.

Threats to Validity

The internal, external, and constructive threats to validity were discussed, along with measures taken to mitigate these threats.

Related Works

The related works section discussed various approaches to vulnerability detection and analysis in OSS projects, as well as patch generation methods. PatUntrack was distinguished from these works by its ability to generate patch examples based on IR textual descriptions without tracked insecure code.

Conclusion

PatUntrack is an automated approach for generating patch examples from IRs without tracked insecure code. It optimizes LLMs to analyze vulnerabilities and generate appropriate patch examples. Experimental results demonstrated the effectiveness of PatUntrack, with significant improvements over traditional LLM baselines. Future work will focus on enhancing PatUntrack by introducing additional third-party resources and analyzing its impact on traditional APR tools.

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

PatUntrack: Automated Generating Patch Examples for Issue Reports without Tracked Insecure Code

Authors:

Paper:

Introduction

Preliminaries

Preliminary Study of Vulnerability Patching

Motivating Example

Approach

Generating Complete VTP Description

Correcting Hallucinatory VTP Description

Generating Insecure Code & Patch Example

Experimental Design

Dataset Preparation

Experimental Baselines

Metrics and Experimental Settings

Results

Performances on Insecure Code Generation

Performances on Patch Example Generation

Effect of IR’s Detailed Information

Ablation Study

Human Evaluation

Discussion

Effect of Joint Code Generation

Unsuccessful Cases

Threats to Validity

Related Works

Conclusion

Related Posts