Subscribe to Updates
Subscribe to get the latest content in real time.
Author: Ben Cooper
Authors: Jian Li、Weiheng Lu Paper: https://arxiv.org/abs/2408.08632 Introduction Multimodal Large Language Models (MLLMs) have become a focal point in both academia and industry due to their exceptional performance in various applications such as visual question answering, visual perception, understanding, and reasoning. This paper provides a comprehensive review of 180 benchmarks and evaluations for MLLMs, focusing on five key areas: perception and understanding, cognition and reasoning, specific domains, key capabilities, and other modalities. The paper also discusses the limitations of current evaluation methods and explores promising future directions. Preliminaries The paper compares several common MLLMs, including GPT-4, Gemini, LLaVA, Qwen-VL, Claude, InstructBLIP,…
Authors: Tomasz Prytuła Paper: https://arxiv.org/abs/2408.08336 Graph Representations of 3D Data for Machine Learning Abstract This paper provides an overview of combinatorial methods to represent 3D data, such as graphs and meshes, from the perspective of their suitability for analysis using machine learning algorithms. It highlights the advantages and disadvantages of various representations and discusses methods for generating and switching between these representations. The paper also presents two concrete applications in life science and industry, emphasizing the practical challenges and potential solutions. Introduction 3D data is prevalent in various scientific and industrial domains, including bioimaging, molecular chemistry, and 3D modeling. While…
Authors: Qiming Xia、Hongwei Lin、Wei Ye、Hai Wu、Yadan Luo、Shijia Zhao、Xin Li、Chenglu Wen Paper: https://arxiv.org/abs/2408.08092 Introduction In recent years, LiDAR-based 3D object detection has seen significant advancements. However, the process of annotating 3D bounding boxes in LiDAR point clouds is labor-intensive and time-consuming. This paper introduces OC3D, a weakly supervised method that requires only coarse clicks on the bird’s eye view (BEV) of the 3D point cloud, significantly reducing annotation costs. OC3D employs a two-stage strategy to generate box-level and mask-level pseudo-labels from these coarse clicks, achieving state-of-the-art performance on the KITTI and nuScenes datasets. Related Work LIDAR-based 3D Object Detection Fully-supervised 3D…
Authors: Anders Gjølbye、Lina Skerath、William Lehn-Schiøler、Nicolas Langer、Lars Kai Hansen Paper: https://arxiv.org/abs/2408.08065 Introduction Electroencephalography (EEG) research has traditionally focused on narrowly defined tasks, but recent advancements are leveraging unlabeled data within larger models for broader applications. This shift addresses a critical challenge in EEG research: managing high noise levels in EEG data. Kostas et al. (2021) demonstrated that self-supervised learning (SSL) outperforms traditional supervised methods. However, current preprocessing methods often fail to efficiently handle the large data volumes required for SSL due to their lack of optimization and reliance on subjective manual corrections. This paper introduces SPEED, a Python-based EEG preprocessing pipeline…
Authors: Eunhae Lee、Pat Pataranutaporn、Judith Amores、Pattie Maes Paper: https://arxiv.org/abs/2408.06602 Introduction The rapid advancements in artificial intelligence (AI) have sparked a range of public perceptions, from utopian to dystopian visions. This study investigates the psychological factors influencing belief in AI predictions about personal behavior, comparing it to belief in astrology and personality-based predictions. The research aims to understand how cognitive style, paranormal beliefs, AI attitudes, personality traits, and other factors affect the perceived validity, reliability, usefulness, and personalization of predictions from different sources. Results People who are more likely to believe in astrology and personality-based predictions are more likely to believe in…