SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis

Authors:

Xingyue Lin、Xingjian Hu、Shuai Peng、Jianhua Zhu、Liangcai Gao

Paper:

Introduction

Sketching is a fundamental artistic technique that captures the essence of real-world objects through lines and contours. Despite their simplicity, sketches can convey significant visual information, making them recognizable to humans. Recent advancements in deep learning have led to the development of automated sketch synthesis methods, which can save time and reduce costs compared to manual sketching. However, evaluating the quality of synthesized sketches remains a challenge due to the lack of a unified benchmark dataset and appropriate evaluation metrics.

SketchRef Benchmark Dataset

To address the limitations in current sketch evaluation methods, the authors introduce SketchRef, a comprehensive benchmark dataset designed for evaluating sketch synthesis algorithms. SketchRef includes four sub-datasets: Human Body, Human Face, Animal, and Things. Each sub-dataset contains reference photos, synthesized sketches, and shared annotations, including visual keypoints and semantic class labels.

Dataset Composition

Human Body: 1,137 photos with 17 keypoints annotated in the COCO format.
Human Face: 950 photos from the FFHQ dataset with 106 dense keypoints.
Animal: 950 photos from the Animal-Pose evaluation dataset with five animal class labels and 20 keypoints.
Things: 1,500 photos from SEVA, segmented with U2Net to ensure a blank background.

Synthesized sketches are generated using existing methods such as CLI-Passo, Photosketch, UPDG, and LineDrawing, resulting in multiple styles for each reference photo.

Evaluation Methods

Structure-level & Category-level Recognizability

The authors propose two types of recognizability metrics to evaluate sketches comprehensively:

Structure-level Recognizability: Measures whether a sketch retains the key structural features of the reference photo. The mean Object Keypoint Similarity (mOKS) metric is introduced, which uses pose estimation to predict keypoints in sketches and compare them to the reference photo.
Category-level Recognizability: Assesses the ability of a sketch to be identified as the category of the reference photo. This is measured using the average cosine similarity between the CLIP embeddings of class names and the images.

Recognizability Constrained by Simplicity

Sketching involves a tradeoff between simplicity and recognizability. To ensure fair evaluation, the authors introduce a method to measure simplicity using the relative Simplicity Ratio (SR), which compares the complexity of the sketch to its reference photo. Recognizability metrics are then calculated constrained by simplicity levels, ensuring fair comparison across sketches with varying levels of detail.

Collecting Human Assessment

To validate the proposed evaluation methods, the authors conducted a user experiment involving 198 art enthusiasts. Participants evaluated the recognizability and simplicity of synthesized sketches, providing valuable feedback that aligns with human perception.

Experiment

Experimental Setups

Datasets: SketchRef’s four sub-datasets are used for evaluation.
Evaluation Metrics: mRS@α and mRC@α are used to quantify structure-level and category-level recognizability at fixed simplification levels.
Sketch Synthesis Methods: Eight methods are evaluated, including Clipasso, Contour, Anime, OpenSketch, PhotoSketch, and three UPDG styles.
Pose Estimation Models: RTMPose is chosen for its generality, and ten other models are compared for alignment with human perception.

Quantitative Results

The authors evaluate the mean Recognizability on Structure (mRS) and mean Recognizability on Category (mRC) of the eight sketch methods across the datasets. Results show that methods like Anime and CLIPasso perform well in terms of recognizability, while PhotoSketch tends to have lower recognizability due to its coarse outlines.

Human Assessment

The alignment of various metrics with human perception is analyzed, showing that mOKS obtained from pose estimation models correlates well with user-assessed recognizability. The Compression Ratio method for measuring simplicity also shows high correlation with human perception.

Visualization

The proposed metrics are applied to evaluate individual sketches, demonstrating their effectiveness in capturing both structure-level and category-level recognizability. For example, Anime sketches exhibit high structure-level recognizability, while PhotoSketch shows high simplicity but low recognizability.

Conclusion

SketchRef provides a comprehensive benchmark for evaluating sketch synthesis methods, addressing the current gaps in structural and simplicity-aware evaluation. The proposed metrics align well with human perception, offering a fair and detailed assessment of sketch quality. This benchmark is expected to guide future research in sketch synthesis and understanding.

Datasets:

MS COCO、Sketch

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

SketchRef: A Benchmark Dataset and Evaluation Metrics for Automated Sketch Synthesis

Authors:

Paper:

Introduction

SketchRef Benchmark Dataset

Dataset Composition

Evaluation Methods

Structure-level & Category-level Recognizability

Recognizability Constrained by Simplicity

Collecting Human Assessment

Experiment

Experimental Setups

Quantitative Results

Human Assessment

Visualization

Conclusion

Datasets:

Related Posts