Author: Hazel King

Authors: Yi Zhao、Le Chen、Jan Schneider、Quankai Gao、Juho Kannala、Bernhard Schölkopf、Joni Pajarinen、Dieter Büchler Paper: https://arxiv.org/abs/2408.11048 Introduction Empowering robots with human-level dexterity has been a long-standing challenge in robotics. This challenge becomes even more complex when considering tasks that require both dynamic and manipulation skills, such as piano playing. Robot piano playing involves coordinating multiple fingers to press keys accurately and dynamically, making it a high-dimensional control task. While reinforcement learning (RL) has shown promise in single-task performance, it often struggles in multi-task settings. This paper introduces the Robot Piano 1 Million (RP1M) dataset, which aims to bridge this gap by enabling imitation learning…

Read More

Authors: Rui Yang、Jiahao Zhu、Jianping Man、Li Fang、Yi Zhou Paper: https://arxiv.org/abs/2408.10819 Introduction Knowledge Graphs (KGs) are structured semantic knowledge bases that organize entities, concepts, attributes, and their relationships in a graph format. They are pivotal in various applications such as semantic search, recommendation systems, and natural language processing (NLP). However, due to limited annotation resources and technical constraints, existing KGs often have missing key entities or relationships, limiting their functionality. Knowledge Graph Completion (KGC) aims to infer and fill in these missing entities and relationships, thereby enhancing the value and effectiveness of KGs. Traditional KGC methods, such as link prediction and instance…

Read More

Authors: Huafeng Chen、Dian Shao、Guangqian Guo、Shan Gao Paper: https://arxiv.org/abs/2408.10777 Introduction Camouflaged Object Detection (COD) is a challenging task in computer vision that involves identifying objects that blend seamlessly into their surroundings. This task is not only difficult for models but also for human annotators, as it requires precise pixel-wise annotations, which are labor-intensive and time-consuming. Traditional methods demand extensive annotation efforts, often taking up to 60 minutes per image. To address this issue, the authors propose a novel approach that leverages point-based supervision, significantly reducing the annotation burden while maintaining high detection accuracy. Related Work Weakly Supervised Camouflaged Detection Recent research…

Read More

Authors: Haoyu Wang、Bingzhe Wu、Yatao Bian、Yongzhe Chang、Xueqian Wang、Peilin Zhao Paper: https://arxiv.org/abs/2408.10668 Introduction The rapid advancement of Large Language Models (LLMs) such as GPT-4 has significantly impacted various aspects of daily life, providing intelligent assistance in numerous domains. However, alongside these benefits, there are growing concerns about the potential misuse of these models, particularly their ability to generate harmful or dangerous content. Ensuring the safety and reliability of LLMs is crucial for fostering public trust and promoting the responsible use of AI technology. This study addresses the hidden vulnerabilities in LLMs that may persist even after safety alignment. The authors propose a…

Read More

Authors: Xucheng Wan、Naijun Zheng、Kai Liu、Huan Zhou Paper: https://arxiv.org/abs/2408.10524 XCB: An Effective Contextual Biasing Approach to Bias Cross-Lingual Phrases in Speech Recognition Introduction In recent years, End-to-End (E2E) Automatic Speech Recognition (ASR) models have made significant strides in improving speech recognition accuracy. Models such as Transformer, Transducer, and Conformer have set new benchmarks in various speech recognition tasks. However, these models often struggle with recognizing rare words, such as jargon or unique named entities, especially in real-world applications. One popular solution to this problem is contextualized ASR, which integrates contextual information from a predefined list of rare words to enhance recognition…

Read More

Authors: Ik Jun Moon、Junho Moon、Ikbeom Jang Paper: https://arxiv.org/abs/2408.09952 Introduction With the increasing interest in skin diseases and aesthetics, the ability to predict and analyze facial wrinkles has become crucial. Facial wrinkles are significant indicators of aging and can be useful in assessing skin conditions, skin care, and early diagnosis of skin diseases. However, manually analyzing extensive collections of images for facial wrinkles is resource-intensive and subjective, leading to variability in research findings. To address these challenges, this study proposes a deep learning-based approach to automatically segment facial wrinkles. By combining wrinkle data labeled by multiple annotators and leveraging transfer learning,…

Read More

Authors: Gengwei Zhang、Liyuan Wang、Guoliang Kang、Ling Chen、Yunchao Wei Paper: https://arxiv.org/abs/2408.08295 Unleashing the Power of Sequential Fine-tuning for Continual Learning with Pre-training: An In-depth Look at SLCA++ Continual learning (CL) has long been a challenging problem in machine learning, primarily due to the issue of catastrophic forgetting. The advent of pre-trained models (PTMs) has revolutionized this field, offering new avenues for knowledge transfer and robustness. However, the progressive overfitting of pre-trained knowledge into specific downstream tasks remains a significant hurdle. This blog delves into the paper “SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training,” which introduces the Slow…

Read More