Authors:
Thomas Thebaud、Gaël Le Lan、Anthony Larcher
Paper:
https://arxiv.org/abs/2408.08918
Introduction
Biometric recognition systems have become integral to modern security frameworks, leveraging intrinsic properties of users such as voice, handwriting, and other behavioral traits. These systems encode user data into high-dimensional vectors known as embeddings. The theft of these embeddings poses a significant threat, as they are far more difficult to replace than traditional passwords or keys. This study, conducted by Thomas Thebaud, Gaël Le Lan, and Anthony Larcher, explores the vulnerabilities of behavioral biometric systems to spoofing attacks, specifically focusing on automatic speaker verification and handwritten digit analysis systems.
Related Work
Behavioral Biometric Systems
Biometric authentication systems are categorized into three types: knowledge-based (e.g., passwords), possession-based (e.g., keys or cards), and biometric-based (e.g., face, fingerprint, voice, handwriting). This study focuses on behavioral biometrics, particularly speech and handwriting.
Automatic Speaker Verification
Speaker verification systems have evolved from statistical models to neural networks, with ResNet architectures significantly improving performance. The study uses variations of ResNet34 trained on VoxCeleb datasets.
Handwritten Digit Analysis
Handwritten digit authentication systems are less common, with most research focusing on digit identification. The study uses a Bi-LSTM-based system for digit analysis, trained on datasets like eBioDigit and MobileDigit.
Reconstruction of Speech and Handwriting
Speech Reconstruction
Voice conversion systems like AutoVC are used to reconstruct speech embeddings. These systems can generate speech that mimics a target speaker’s voice, even if the speaker has never been seen before.
Handwriting Reconstruction
Handwriting reconstruction involves generating sequences of points in two dimensions. Systems like LSTM-MDN are used to reconstruct handwriting from embeddings.
Supervised and Unsupervised Rotational Alignments
Aligning embeddings from different encoders is crucial for spoofing attacks. The study uses Procrustes analysis for supervised alignment and Wasserstein Procrustes for unsupervised alignment.
Template Reconstruction Attacks
Template reconstruction attacks involve using stolen embeddings to reconstruct original biometric data. The study evaluates the effectiveness of these attacks using metrics like False Acceptation Rate (FAR) and Spoofing False Acceptation Rate (sFAR).
Research Methodology
Threat Model
The study considers two attack scenarios: black-box (attacker can use the encoder but doesn’t know its parameters) and architecture-only (attacker knows the encoder’s architecture but has no access). The goal is to reconstruct biometric data from stolen embeddings using a trained decoder and alignment techniques.
Datasets
Handwritten Digit Datasets
The study uses datasets like eBioDigit, MobileDigit, and a private dataset from Orange Innovation. These datasets contain sequences of points representing handwritten digits.
Speech Datasets
Speech datasets include VoxCeleb1, VoxCeleb2, and VCTK, containing speech extracts used for training and evaluating speaker verification systems.
Experimental Design
Attacking a Handwritten Digit Analysis System
The Digits Attack Scenario
The dataset is split into training and validation sets for both target and attack encoders. The study uses Bi-LSTM encoders and compares different decoders (LSTM and LSTM-MDN) for reconstructing handwritten digits.
Choosing a Digit’s Decoder
The study evaluates the performance of LSTM and LSTM-MDN decoders using metrics like accuracy, EER, and sFAR.
Choosing a Digits Alignment
Various alignment techniques are compared, including identity matrix, Procrustes analysis, fine-tuned Procrustes, and Wasserstein Procrustes. An oracle alignment is used to measure the upper limit of rotational alignments.
Attacking a Speaker Verification System
The Speech Attack Scenarios
The study uses Fast ResNet34 encoders for both target and attack scenarios. The AutoVC voice conversion system is used for reconstructing speech embeddings.
The Speech Alignment Experiments
The study compares supervised and unsupervised alignment techniques for reconstructing speech embeddings.
Results and Analysis
Handwritten Digit Analysis System
The study shows that the LSTM-MDN decoder performs better than the LSTM decoder. The Wasserstein Procrustes alignment provides the best results among unsupervised techniques, but the oracle alignment shows the upper limit of performance.
Speaker Verification System
The study finds that even without alignment, the decoder can partially spoof the system. The Wasserstein Procrustes alignment improves spoofing performance, but the oracle alignment shows the best possible results.
Overall Conclusion
This study highlights the vulnerabilities of behavioral biometric systems to template reconstruction attacks. By using supervised and unsupervised alignment techniques, the study demonstrates the feasibility of reconstructing biometric data from stolen embeddings. The findings underscore the need for enhanced security measures, such as bio-hashing, to protect biometric data. Future research should explore the efficacy of alignment techniques against different network architectures and extend the study to other biometric modalities.