Authors:
Rasha Alshawi、Md Meftahul Ferdaus、Mahdi Abdelguerfi、Kendall Niles、Ken Pathak、Steve Sloan
Paper:
https://arxiv.org/abs/2408.10181
Introduction
In the realm of infrastructure maintenance, the inspection of culverts and sewer pipes is crucial for ensuring the integrity and longevity of water management systems. Traditional inspection methods, such as manual video reviews, are time-consuming and prone to human error. Automated semantic segmentation techniques offer a promising alternative by enhancing inspection accuracy and efficiency. However, the challenge of imbalanced datasets, where certain defect types are underrepresented, poses a significant hurdle. This paper introduces the Enhanced Feature Pyramid Network (E-FPN), a deep learning model designed to address these challenges by improving feature extraction and handling object variations in imbalanced datasets.
Related Work
Evolution of Semantic Segmentation Techniques
Semantic segmentation has evolved significantly with the advent of deep learning. Early methods relied on hand-crafted features and conventional classifiers, but the introduction of Convolutional Neural Networks (CNNs) marked a transformative shift. Fully Convolutional Networks (FCNs) facilitated dense predictions over arbitrary-sized inputs, laying the groundwork for subsequent architectures like U-Net, Feature Pyramid Networks (FPNs), and Vision Transformers (ViTs).
-
Encoder-Decoder Architectures and U-Net Variants: U-Net and its variants have advanced the field by effectively combining low-level and high-level features, particularly in medical imaging. Innovations like Convolutional Block Attention Modules (CBAM) and Attention Sparse Convolutional U-Net (ASCU-Net) have further refined performance.
-
FPNs and Multi-Scale Feature Representation: FPNs address multi-scale feature extraction by constructing a hierarchical pyramid of feature maps. This approach integrates contextual information at different scales, making it well-suited for tasks with significant variations in scale and orientation.
-
ViTs: ViTs adapt self-attention mechanisms from natural language processing to visual data, capturing long-range dependencies and global context. However, they are computationally demanding and require extensive datasets for optimal performance.
Research Gap and Motivation
Despite advancements, existing models like U-Net, FPN, and ViTs face challenges in handling imbalanced datasets and varying object scales. The unique characteristics of culvert-sewer defect datasets necessitate specialized approaches. This paper introduces E-FPN, designed to improve object segmentation and manage class imbalance without adding computational overhead.
Research Methodology
E-FPN Architecture
The E-FPN builds on traditional FPNs, incorporating enhancements to handle object variations and improve feature extraction. The architecture consists of two core components:
-
Bottom-up Pathway: This pathway extracts multi-scale features through convolutional operations and downsampling stages. It incorporates custom Inception-like blocks with 3×3 and 5×5 filters, along with depth-wise separable convolutions to reduce parameter count without compromising performance.
-
Top-down Pathway: This pathway enhances the bottom-up process by upsampling and merging features to create higher-resolution images. It employs feature fusion, aliasing mitigation, consistent output configuration, and efficient upsampling to maintain spatial details and ensure accurate defect localization.
Progressive Enhancement of FPN Architectures
The development of E-FPN involved extensive experiments with various modifications to the original FPN architecture, including:
- Atrous Convolutions
- Attention Gates (AGs)
- Self-Attention Mechanisms
- Enhanced Squeeze-and-Excitation (SE) Blocks
- Inception and Residual Blocks
- Factorized Inception Blocks
These modifications aimed to enhance feature extraction and representation, leading to the evolution of the proposed E-FPN model.
Experimental Design
Datasets
Culvert-Sewer Defects Dataset
The dataset comprises 6,300 annotated frames from 580 videos of underground infrastructure inspections. The frames were annotated with pixel-wise labels for nine defect classes, reflecting real-world conditions and class imbalances.
Aerial Semantic Segmentation Drone Dataset
This dataset includes high-resolution images with pixel-accurate annotations across 22 classes, captured from varying altitudes and perspectives. It serves as a benchmark to evaluate the model’s robustness and adaptability.
Imbalance Handling Techniques
To address class imbalance, the study employed two techniques:
-
Class Decomposition and Ensemble Learning: The dataset was partitioned into smaller, more homogeneous groups based on defect characteristics. Models were trained on these groups and their predictions combined using ensemble learning techniques.
-
Data Augmentation: Techniques like horizontal flip, Gaussian blur, color jittering, shearing, rotation, random noise, and random crop were applied to increase dataset diversity and ensure fair representation of defect classes.
Results and Analysis
Comparison with Baseline Architectures
The E-FPN was compared with state-of-the-art models, including U-Net, CBAM U-Net, ASCU-Net, and Swin Transformer. The E-FPN demonstrated superior performance, achieving an average IoU improvement of 13.8% on the culvert-sewer defects dataset and 27.3% on the aerial drone dataset.
Impact of Data Imbalance Mitigation Techniques
The combined use of class decomposition and data augmentation led to significant improvements in model performance. The E-FPN achieved a 6.97% enhancement in IoU, demonstrating the effectiveness of these techniques in addressing class imbalance.
Ablation Study
The ablation study highlighted the impact of individual architectural components on E-FPN’s performance. Enhancements like SE blocks and Inception blocks with residual connections contributed to noticeable performance gains, while factorized versions and additional layers did not provide expected benefits.
Overall Conclusion
The E-FPN presents a robust solution for semantic segmentation in imbalanced culvert-sewer datasets. It incorporates architectural innovations and data balancing strategies to enhance performance and computational efficiency. The model’s superior performance on diverse datasets underscores its potential for real-world applications in infrastructure inspection. Future research will focus on integrating temporal information, exploring unsupervised pre-training, and developing adaptive techniques for resource-constrained environments.