Authors:
Vijul Shah、Brian B. Moser、Ko Watanabe、Andreas Dengel
Paper:
https://arxiv.org/abs/2408.10397
Introduction
The ability to accurately measure pupil diameter is crucial for assessing various psychological and physiological states, such as stress levels and cognitive load. However, the low resolution of images in many eye-tracking datasets often hampers precise measurement. This study investigates the impact of various upscaling methods on pupil diameter predictions from webcam images. By comparing several pre-trained super-resolution (SR) methods, the study aims to determine how upscaling can enhance the accuracy of pupil diameter prediction models.
Related Work
Super-Resolution as Pre-Processing
Image super-resolution (SR) is the process of converting low-resolution (LR) images into high-resolution (HR) images. This technique has been widely used in fields such as medical imaging, satellite imagery, and consumer electronics to enhance image clarity and detail. SR models can be broadly categorized into regression-based models and generative models, each with distinct training objectives and architectures.
Pupil Diameter Estimation
Previous methods for pupil diameter estimation have utilized various techniques, including dual-camera setups and image processing algorithms. However, these methods often face constraints such as the need for specific conditions and the lack of publicly available datasets. The introduction of the EyeDentify dataset, which offers webcam-based eye images with corresponding pupil diameters, marks a significant advancement in this field.
Research Methodology
The primary goal of this study is to apply SR models to improve the quality of eye images derived from webcam images, thereby enhancing the accuracy of pupil diameter estimation. The methodology involves using pre-trained SR models to upscale the entire face images before extracting and analyzing the eye regions.
SR Techniques
The study evaluates several SR methods, including:
- Regression-based Models:
- SRResNet: Inspired by ResNet architecture.
-
HAT: A state-of-the-art vision transformer for image SR.
-
Generative Models:
- GFPGAN: A face-oriented SR GAN model.
- CodeFormer: A face-oriented VQ-VAE based model.
- Real-ESRGAN: A generalized SR GAN approach for photorealistic textures.
Experimental Design
Model Details
For pupil diameter prediction, the study employs regression models such as ResNet18, ResNet50, and ResNet152. The datasets created through SR methods are used to train and evaluate these models. The eye images are upscaled by 2x and 4x using bi-cubic interpolation and refined using SR models.
Training Details
The training setup follows the original EyeDentify work, using 5-fold cross-validation. The models are trained for 50 epochs with a batch size of 128, using the AdamW optimizer with default settings.
Results and Analysis
Quantitative Results
The study presents 5-fold cross-validation results for ResNet18, ResNet50, and ResNet152 on SRx2 and SRx4 datasets. The results indicate that upscaling significantly benefits pupil diameter prediction, although the effectiveness varies across different SR methods and scales.
Visualizations
Class Activation Maps (CAM) from the final convolution layer of each model reveal that upscaling affects where prediction models focus their attention. The top-performing models show high activation corresponding to the shape of the eye, indicating that image upscaling influences both the model’s focus and performance.
Limitations
The study faces several challenges, including inconsistencies in participant posture, gaze shifts, head movements, and variations in lighting conditions. Additionally, GAN-based models introduce artifacts that complicate training.
Overall Conclusion
This study demonstrates that SR techniques can significantly enhance the accuracy of pupil diameter prediction from webcam images. While traditional bicubic upscaling often performs well, advanced SR methods like Real-ESRGAN and SRResNet generally provide superior error rates under specific conditions. Future work will explore additional SR methods and integrate more diverse data conditions to ensure the robustness and applicability of pupil diameter estimation techniques in real-world scenarios.
By leveraging SR technology, this research advances our understanding of image upscaling in pupilometry and sets a strong foundation for future advancements in eye-tracking technologies.