Authors:

Thomas ThebaudGaël Le LanAnthony Larcher

Paper:

https://arxiv.org/abs/2408.08918

Introduction

Biometric recognition systems have become integral to modern security frameworks, leveraging intrinsic properties of users such as voice, handwriting, and other behavioral traits. These systems encode user data into high-dimensional vectors known as embeddings. The theft of these embeddings poses a significant threat, as they are far more difficult to replace than traditional passwords or keys. This study, conducted by Thomas Thebaud, Gaël Le Lan, and Anthony Larcher, explores the vulnerabilities of behavioral biometric systems to spoofing attacks, specifically focusing on automatic speaker verification and handwritten digit analysis systems.

Related Work

Behavioral Biometric Systems

Biometric authentication systems are categorized into three types: knowledge-based (e.g., passwords), possession-based (e.g., keys or cards), and biometric-based (e.g., face, fingerprint, voice, handwriting). This study focuses on behavioral biometrics, particularly speech and handwriting.

Automatic Speaker Verification

Speaker verification systems have evolved from statistical models to neural networks, with ResNet architectures significantly improving performance. The study uses variations of ResNet34 trained on VoxCeleb datasets.

Handwritten Digit Analysis

Handwritten digit authentication systems are less common, with most research focusing on digit identification. The study uses a Bi-LSTM-based system for digit analysis, trained on datasets like eBioDigit and MobileDigit.

Reconstruction of Speech and Handwriting

Speech Reconstruction

Voice conversion systems like AutoVC are used to reconstruct speech embeddings. These systems can generate speech that mimics a target speaker’s voice, even if the speaker has never been seen before.

Handwriting Reconstruction

Handwriting reconstruction involves generating sequences of points in two dimensions. Systems like LSTM-MDN are used to reconstruct handwriting from embeddings.

Supervised and Unsupervised Rotational Alignments

Aligning embeddings from different encoders is crucial for spoofing attacks. The study uses Procrustes analysis for supervised alignment and Wasserstein Procrustes for unsupervised alignment.

Template Reconstruction Attacks

Template reconstruction attacks involve using stolen embeddings to reconstruct original biometric data. The study evaluates the effectiveness of these attacks using metrics like False Acceptation Rate (FAR) and Spoofing False Acceptation Rate (sFAR).

Research Methodology

Threat Model

The study considers two attack scenarios: black-box (attacker can use the encoder but doesn’t know its parameters) and architecture-only (attacker knows the encoder’s architecture but has no access). The goal is to reconstruct biometric data from stolen embeddings using a trained decoder and alignment techniques.

Datasets

Handwritten Digit Datasets

The study uses datasets like eBioDigit, MobileDigit, and a private dataset from Orange Innovation. These datasets contain sequences of points representing handwritten digits.

Speech Datasets

Speech datasets include VoxCeleb1, VoxCeleb2, and VCTK, containing speech extracts used for training and evaluating speaker verification systems.

Experimental Design

Attacking a Handwritten Digit Analysis System

The Digits Attack Scenario

The dataset is split into training and validation sets for both target and attack encoders. The study uses Bi-LSTM encoders and compares different decoders (LSTM and LSTM-MDN) for reconstructing handwritten digits.

Choosing a Digit’s Decoder

The study evaluates the performance of LSTM and LSTM-MDN decoders using metrics like accuracy, EER, and sFAR.

Choosing a Digits Alignment

Various alignment techniques are compared, including identity matrix, Procrustes analysis, fine-tuned Procrustes, and Wasserstein Procrustes. An oracle alignment is used to measure the upper limit of rotational alignments.

Attacking a Speaker Verification System

The Speech Attack Scenarios

The study uses Fast ResNet34 encoders for both target and attack scenarios. The AutoVC voice conversion system is used for reconstructing speech embeddings.

The Speech Alignment Experiments

The study compares supervised and unsupervised alignment techniques for reconstructing speech embeddings.

Results and Analysis

Handwritten Digit Analysis System

The study shows that the LSTM-MDN decoder performs better than the LSTM decoder. The Wasserstein Procrustes alignment provides the best results among unsupervised techniques, but the oracle alignment shows the upper limit of performance.

Speaker Verification System

The study finds that even without alignment, the decoder can partially spoof the system. The Wasserstein Procrustes alignment improves spoofing performance, but the oracle alignment shows the best possible results.

Overall Conclusion

This study highlights the vulnerabilities of behavioral biometric systems to template reconstruction attacks. By using supervised and unsupervised alignment techniques, the study demonstrates the feasibility of reconstructing biometric data from stolen embeddings. The findings underscore the need for enhanced security measures, such as bio-hashing, to protect biometric data. Future research should explore the efficacy of alignment techniques against different network architectures and extend the study to other biometric modalities.

Share.

Comments are closed.

Exit mobile version