Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

Authors:

Binbin Ding、Penghui Yang、Zeqing Ge、Shengjun Huang

Paper:

Introduction

Federated Learning (FL) is a distributed machine learning framework that allows multiple clients to collaboratively train models while preserving data privacy. However, this decentralized nature also opens up vulnerabilities, particularly to backdoor attacks. These attacks embed malicious behaviors into the model, which remain dormant under normal conditions but activate when specific triggers are present. This paper introduces a novel defense mechanism called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to mitigate such backdoor attacks.

Related Work

Backdoor Attacks in FL

Backdoor attacks manipulate models to make specific predictions by embedding triggers within samples. These triggers can be numerical, semantic, or even imperceptible. Strategies like decomposing a global trigger into localized variants across multiple malicious clients enhance attack stealth.

Mid-Training Defenses in FL

Mid-training defenses aim to identify and filter out malicious updates by scrutinizing discrepancies among the submitted data. Algorithms like Krum, Trimmed Mean, and Bulyan focus on minimizing the influence of outliers. However, these methods struggle in non-IID data scenarios, where the variability in data distribution can amplify discrepancies among benign updates.

Post-Training Defenses in FL

Post-training defenses eliminate the need to identify update discrepancies across clients. Techniques like pruning low-activation neurons have shown effectiveness in mitigating backdoor attacks while preserving overall model performance. However, pruning can degrade model performance when backdoor and benign behaviors rely on the same neurons.

Problem Setup

Federated Learning

In a FL system with (N) clients, each client (k) has a local dataset (D_k). The server manages client participation in each training round, with the global training objective defined as:

[ w^* = \arg \min_w \sum_{k=1}^N \lambda_k f(w, D_k) ]

where (w^*) denotes the optimal global model parameters, and (\lambda_k) indicates the weight of the loss for client (k).

Threat Model

The server lacks access to local training data and processes, making it vulnerable to backdoor attacks. Malicious attackers can perform white-box attacks, having access to both the training data and model parameters of compromised clients. The backdoored model is engineered to predict a specific class for samples embedded with triggers.

Defense Model

Defender’s Capabilities

The server maintains a small auxiliary dataset to compute the activation inputs of the fully connected layer and evaluate performance variations in the model.

Defender’s Goals

The defender aims to reduce the model’s accuracy on trigger samples while maintaining its performance on clean data.

Methodology

Flipping

FLAIN focuses on low-activation input neurons in the fully connected layer. Instead of pruning, FLAIN modifies these weights by flipping the updates obtained during global training. The process involves:

Initializing the global model parameters and recording the initial weights.
Completing global training and computing the weight updates.
Using an auxiliary dataset to assess the model’s performance and retrieve the activation input vector.
Setting a threshold (\lambda) and flipping the updates for neurons with activation inputs below (\lambda).

Adaptive Threshold

The threshold (\lambda) determines the tradeoff between clearing backdoors and maintaining performance on clean data. The server incrementally increases (\lambda) until the performance drop exceeds a predefined threshold (\rho).

Performance Evaluation

Experiment Setup

The experiments use the Adam optimizer with a learning rate of 0.001. Evaluation metrics include Attack Success Rate (ASR) and Natural Accuracy (ACC). Various datasets like MNIST, Fashion-MNIST, EMNIST, and CIFAR-10 are used.

Experiment Results

Attack Tasks under IID

FLAIN, Pruning, and RLR effectively reduce the ASR to near 0 for certain tasks on MNIST, FMNIST, and EMNIST, with negligible impact on ACC. However, their performances vary significantly on CIFAR-10.

Attack Tasks under non-IID

Under non-IID data distribution conditions, FLAIN and Pruning outperform RLR, maintaining higher ACC and achieving lower ASR in the evaluated tasks.

Different Pixel Backdoor Patterns

FLAIN maintains an average ACC reduction of only about 3.2% across various backdoor patterns, while keeping the ASR limited to approximately 0.8%.

Impact of Varying PDR

FLAIN effectively keeps the ASR at a minimal level while preserving the model’s performance on the main task across different PDR settings.

Diverse MCR Settings

FLAIN demonstrates stable defense performance across various MCR settings, with the maximum reduction in ACC not exceeding 2% and the ASR kept below 2%.

Conclusion

FLAIN is a backdoor defense method that flips the weight updates of low-activation input neurons. The adaptive threshold balances the tradeoff between eliminating backdoors and maintaining performance on clean data. FLAIN exhibits superior adaptability compared to direct pruning methods across a broader range of attack scenarios, including non-IID data distribution and high MCR settings. Future work will explore further enhancements to FLAIN, including its application in data-free scenarios and more complex models.

Datasets:

CIFAR-10、Fashion-MNIST

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

Mitigating Backdoor Attacks in Federated Learning via Flipping Weight Updates of Low-Activation Input Neurons

Authors:

Paper:

Introduction

Related Work

Backdoor Attacks in FL

Mid-Training Defenses in FL

Post-Training Defenses in FL

Problem Setup

Federated Learning

Threat Model

Defense Model

Defender’s Capabilities

Defender’s Goals

Methodology

Flipping

Adaptive Threshold

Performance Evaluation

Experiment Setup

Experiment Results

Attack Tasks under IID

Attack Tasks under non-IID

Different Pixel Backdoor Patterns

Impact of Varying PDR

Diverse MCR Settings

Conclusion

Datasets:

Related Posts