Authors:
Huafeng Chen、Pengxu Wei、Guangqian Guo、Shan Gao
Paper:
https://arxiv.org/abs/2408.10760
Introduction
Camouflaged Object Detection (COD) is a challenging task that involves identifying objects that blend seamlessly into their surroundings. Traditional COD methods rely heavily on mask annotations, which are labor-intensive and time-consuming to produce. This paper introduces SAM-COD, a novel framework designed to address the limitations of existing weakly-supervised COD methods. SAM-COD leverages the Segment Anything Model (SAM) and introduces several innovative components to improve performance under weakly-supervised settings.
Related Work
Camouflaged Object Detection
COD aims to detect objects that are visually indistinguishable from their backgrounds. Previous works like SINet and ZoomNet have made significant strides in this area but rely on fully-supervised learning with pixel-level annotations. Weakly-supervised methods like CRNet and WS-SAM have attempted to reduce annotation costs but suffer from performance gaps compared to fully-supervised methods.
Segment Anything Model (SAM)
SAM has shown promise in traditional segmentation tasks but faces challenges in COD due to issues like extreme responses and semantically erroneous responses. SAM-Adapter has been proposed to align SAM with COD data, but it still falls short in weakly-supervised settings.
Knowledge Distillation
Knowledge distillation involves training a smaller network to mimic a larger one. While effective in model compression, traditional distillation methods are not well-suited for the challenging COD scenarios with limited supervision.
Research Methodology
Prompt Adapter
The Prompt Adapter converts scribble annotations into discrete points, making them compatible with SAM. This is achieved using the Zhang-Suen algorithm to extract the skeleton of the scribble and then performing discrete sampling.
Response Filter
The Response Filter addresses the issue of extreme responses by SAM. It calculates the ratio of the mask size to the image size and filters out masks that exhibit extreme responses.
Semantic Matcher
The Semantic Matcher improves the semantic accuracy of the masks generated by SAM. It measures the semantic score of the mask using semantic entropy and selects masks that balance segmentation details and accurate semantics.
Prompt-Adaptive Knowledge Distillation
This component enhances knowledge distillation by introducing prompt-guided knowledge. It constructs a prompt-adaptive mask for knowledge distillation based on the input prompt, focusing on high-value regions within the camouflage scene.
Experimental Design
Datasets
Experiments were conducted on three COD benchmarks: CAMO, COD10K, and NC4K. The network was trained on a scribble-annotated dataset (S-COD) and re-annotated datasets for point (P-COD) and bounding box (B-COD) supervision.
Evaluation Metrics
Four evaluation metrics were used: Mean Absolute Error (MAE), S-measure (Sm), E-measure (Em), and weighted F-measure (Fwβ).
Implementation Details
The method was implemented using PyTorch and tested on a GeForce RTX4090 GPU. The encoder used was PVT-B4, and the training process involved two main steps: training the encoder and decoder in the semantic matcher and then using the distillation source for knowledge distillation.
Results and Analysis
Quantitative Comparison
SAM-COD outperforms state-of-the-art weakly-supervised and fully-supervised methods. It achieves substantial improvements in MAE, Sm, and Em metrics compared to WS-SAM and even surpasses fully-supervised methods like ZoomNet.
Qualitative Evaluation
The prediction maps generated by SAM-COD are clearer and more complete, with sharper contours compared to other methods. It performs well in various challenging scenarios, including tiny objects, huge objects, and complex backgrounds.
Parameter Complexity
Under similar parameter complexity and computational cost, SAM-COD outperforms fully-supervised methods, demonstrating its efficiency.
Ablation Study
Ablation experiments confirm the effectiveness of each component of SAM-COD. The Prompt Adapter, Response Filter, and Semantic Matcher all contribute to improved performance.
Extension to SOD
SAM-COD also shows remarkable performance in Salient Object Detection (SOD) tasks, highlighting its versatility.
Overall Conclusion
SAM-COD is a groundbreaking framework for weakly-supervised camouflaged object detection. It integrates various weakly-supervised labels and addresses the limitations of SAM in COD tasks. The proposed components, including the Prompt Adapter, Response Filter, Semantic Matcher, and Prompt-Adaptive Knowledge Distillation, significantly enhance the performance of SAM-COD. Extensive experiments demonstrate its superiority over state-of-the-art methods, making it a promising solution for COD and related tasks.