Authors:

Peng ZhouYongdong LiuLixun MaWeiye ZhangHaohan TanZhenguang LiuButian Huang

Paper:

https://arxiv.org/abs/2408.10657

Introduction

The increasing adoption of encryption protocols has inadvertently provided a cover for malicious activities, making it challenging to detect such threats. Power grid systems, being critical infrastructure, are particularly vulnerable to these attacks. Traditional methods for detecting malicious encrypted traffic often rely on static pre-trained models, which are not well-suited for dynamic environments like blockchain-based power grid systems. These methods struggle to adapt to new types of encrypted attacks, leading to significant performance drops.

To address these challenges, the authors propose ETGuard, a novel framework designed to automatically detect malicious encrypted traffic in blockchain-based power grid systems. ETGuard incorporates incremental learning to adapt to new attack patterns while retaining knowledge of old ones. This blog delves into the details of ETGuard, its methodology, experimental design, and the results achieved.

Related Work

Malicious Encrypted Traffic Detection

Traditional methods for detecting malicious encrypted traffic often rely on signature-based techniques, which depend heavily on the quality of decryption operations and predefined rules. With the advent of artificial intelligence, machine learning methods have been increasingly adopted to enhance detection capabilities. These methods extract statistical features from traffic, offering faster and more accurate results. However, they still fall short in detecting new types of encrypted traffic attacks.

Incremental Learning

Incremental learning aims to balance the retention of old task information with the absorption of new task information, addressing the catastrophic forgetting problem. Existing methods can be categorized into replay-based methods and parameter optimization-based methods. Replay-based methods mitigate catastrophic forgetting by replaying samples of old tasks while learning new ones. However, these methods often suffer from overfitting to old data.

Research Methodology

Method Overview

ETGuard’s architecture consists of three key components:

  1. Data Preprocessing: Raw packets are cleaned and processed into distinct sequences for different clients. An unsupervised auto-encoder with stacked bi-GRUs is used to extract features from these sequences.
  2. Incremental Learning Module: This module adapts to novel attacks while preventing catastrophic forgetting. It uses a sample buffer to store representative traffic samples and employs an anti-forgetting loss function for incremental updates.
  3. Detection Module: This module learns the feature distinctions between benign and malicious sequences to continuously identify potential attacks. The learning process is supervised by classification loss and incremental learning losses.

Data Preprocessing

The preprocessing module cleans and processes raw packet data into distinct sequences for different clients. Packets are grouped by a five-tuple (source and destination IP addresses, source and destination ports, and transport layer protocol) and sorted chronologically. Irrelevant packet information is removed, and the resulting sequences are formatted with components such as packet length, duration, and mean time interval between packets.

Incremental Learning Objectives

The incremental learning module uses empirical replay with targeted modifications for encrypted traffic scenarios. A sample buffer stores representative traffic samples, which are dynamically updated using a reservoir sampling algorithm. The incremental learning loss function balances learning between new and old data, mitigating catastrophic forgetting.

Detection Module

Given the high volume and rate of traffic data in blockchain-based power grid scenarios, the detection module employs a multi-layer perceptron (MLP) model. The MLP model is chosen for its simplicity, lower computing resource requirements, and faster training speeds, making it suitable for real-time traffic monitoring.

Experimental Design

Datasets

The evaluation of ETGuard was conducted on three datasets:

  1. CIRA-CIC-DoHBrw-2020 (DoHBrw): This dataset includes a mix of benign and malicious DNS-over-HTTPS (DoH) traffic.
  2. CIC-AndMal2017 (CIC): This dataset contains a variety of malicious attacks from 42 unique malware families.
  3. GridET-2024 (GridET): This dataset was created to detect encrypted attacks in real-world blockchain-based power grid scenarios. It includes benign traffic samples from power grid system interactions and malicious traffic samples from various sources.

Implementation Details

The detection framework and baselines were implemented using Python 3.8.5 and run on a Linux server with an NVIDIA GeForce RTX 3090 GPU. Parameters were fine-tuned using Grid Search to optimize performance.

Evaluation Metrics

Accuracy (ACC) and F1 Score were used as metrics to evaluate the performance of ETGuard in detecting malicious traffic and incremental learning.

Results and Analysis

Performance on Malicious Encrypted Traffic Detection

ETGuard was benchmarked against state-of-the-art methods on the DoHBrw, CIC, and GridET datasets. The results showed that ETGuard consistently outperformed existing methods across all datasets. For instance, ETGuard achieved an F1 score of 0.92 on DoHBrw, compared to 0.88 for the state-of-the-art method RAPIER.

Performance on Incremental Learning

To evaluate incremental learning performance, a new dataset combining benign samples from DoHBrw and malicious samples from CIC was created. ETGuard was compared against five incremental learning methods and a variant without the incremental learning module (ETGuard-V). The results demonstrated that ETGuard achieved state-of-the-art performance in almost all settings, with a significant performance gap between ETGuard and ETGuard-V as the number of rounds increased.


Overall Conclusion

ETGuard addresses the challenges of detecting malicious encrypted traffic in blockchain-based power grid systems by incorporating incremental learning to adapt to new attack patterns. The proposed framework demonstrated state-of-the-art performance on multiple benchmark datasets, including a newly created dataset for blockchain-based power grid scenarios. ETGuard’s ability to retain knowledge of old attack patterns while learning new ones makes it a robust solution for dynamic environments.

The research was supported by the State Grid Zhejiang Electric Power Company, LTD. Information and Communication Branch, China.


By leveraging incremental learning and advanced feature extraction techniques, ETGuard sets a new standard for detecting malicious encrypted traffic in critical infrastructure systems. The introduction of the GridET-2024 dataset further enhances the research community’s ability to evaluate and improve upon existing methods.

Code:

https://github.com/pppmzt/etguard

Share.

Comments are closed.

Exit mobile version