Authors:
Xingyue Lin、Xingjian Hu、Shuai Peng、Jianhua Zhu、Liangcai Gao
Paper:
https://arxiv.org/abs/2408.08623
Introduction
Sketching is a fundamental artistic technique that captures the essence of real-world objects through lines and contours. Despite their simplicity, sketches can convey significant visual information, making them recognizable to humans. Recent advancements in deep learning have led to the development of automated sketch synthesis methods, which can save time and reduce costs compared to manual sketching. However, evaluating the quality of synthesized sketches remains a challenge due to the lack of a unified benchmark dataset and appropriate evaluation metrics.
SketchRef Benchmark Dataset
To address the limitations in current sketch evaluation methods, the authors introduce SketchRef, a comprehensive benchmark dataset designed for evaluating sketch synthesis algorithms. SketchRef includes four sub-datasets: Human Body, Human Face, Animal, and Things. Each sub-dataset contains reference photos, synthesized sketches, and shared annotations, including visual keypoints and semantic class labels.
Dataset Composition
- Human Body: 1,137 photos with 17 keypoints annotated in the COCO format.
- Human Face: 950 photos from the FFHQ dataset with 106 dense keypoints.
- Animal: 950 photos from the Animal-Pose evaluation dataset with five animal class labels and 20 keypoints.
- Things: 1,500 photos from SEVA, segmented with U2Net to ensure a blank background.
Synthesized sketches are generated using existing methods such as CLI-Passo, Photosketch, UPDG, and LineDrawing, resulting in multiple styles for each reference photo.
Evaluation Methods
Structure-level & Category-level Recognizability
The authors propose two types of recognizability metrics to evaluate sketches comprehensively:
-
Structure-level Recognizability: Measures whether a sketch retains the key structural features of the reference photo. The mean Object Keypoint Similarity (mOKS) metric is introduced, which uses pose estimation to predict keypoints in sketches and compare them to the reference photo.
-
Category-level Recognizability: Assesses the ability of a sketch to be identified as the category of the reference photo. This is measured using the average cosine similarity between the CLIP embeddings of class names and the images.
Recognizability Constrained by Simplicity
Sketching involves a tradeoff between simplicity and recognizability. To ensure fair evaluation, the authors introduce a method to measure simplicity using the relative Simplicity Ratio (SR), which compares the complexity of the sketch to its reference photo. Recognizability metrics are then calculated constrained by simplicity levels, ensuring fair comparison across sketches with varying levels of detail.
Collecting Human Assessment
To validate the proposed evaluation methods, the authors conducted a user experiment involving 198 art enthusiasts. Participants evaluated the recognizability and simplicity of synthesized sketches, providing valuable feedback that aligns with human perception.
Experiment
Experimental Setups
- Datasets: SketchRef’s four sub-datasets are used for evaluation.
- Evaluation Metrics: mRS@α and mRC@α are used to quantify structure-level and category-level recognizability at fixed simplification levels.
- Sketch Synthesis Methods: Eight methods are evaluated, including Clipasso, Contour, Anime, OpenSketch, PhotoSketch, and three UPDG styles.
- Pose Estimation Models: RTMPose is chosen for its generality, and ten other models are compared for alignment with human perception.
Quantitative Results
The authors evaluate the mean Recognizability on Structure (mRS) and mean Recognizability on Category (mRC) of the eight sketch methods across the datasets. Results show that methods like Anime and CLIPasso perform well in terms of recognizability, while PhotoSketch tends to have lower recognizability due to its coarse outlines.
Human Assessment
The alignment of various metrics with human perception is analyzed, showing that mOKS obtained from pose estimation models correlates well with user-assessed recognizability. The Compression Ratio method for measuring simplicity also shows high correlation with human perception.
Visualization
The proposed metrics are applied to evaluate individual sketches, demonstrating their effectiveness in capturing both structure-level and category-level recognizability. For example, Anime sketches exhibit high structure-level recognizability, while PhotoSketch shows high simplicity but low recognizability.
Conclusion
SketchRef provides a comprehensive benchmark for evaluating sketch synthesis methods, addressing the current gaps in structural and simplicity-aware evaluation. The proposed metrics align well with human perception, offering a fair and detailed assessment of sketch quality. This benchmark is expected to guide future research in sketch synthesis and understanding.