Author: Nora Brooks

Authors: Michał Bortkiewicz、Władek Pałucki、Vivek Myers、Tadeusz Dziarmaga、Tomasz Arczewski、Łukasz Kuciński、Benjamin Eysenbach Paper: https://arxiv.org/abs/2408.11052 Introduction Self-supervised learning has revolutionized various domains within machine learning, such as natural language processing and computer vision. However, its application in reinforcement learning (RL) has not seen similar success. This paper addresses the challenges faced by self-supervised goal-conditioned reinforcement learning (GCRL) methods, particularly the lack of data from slow environments and unstable algorithms. The authors introduce JaxGCRL, a high-performance codebase and benchmark for self-supervised GCRL, which enables researchers to train agents for millions of environment steps in minutes on a single GPU. This paper aims to provide a…

Read More

Authors: Shyam K Sateesh、Sparsh BK、Uma D Paper: https://arxiv.org/abs/2408.10328 Introduction Emotion recognition from electroencephalogram (EEG) signals is a burgeoning field, particularly in neuroscience and Human-Computer Interaction (HCI). EEG signals provide a descriptive temporal view of brain activity, making them indispensable for understanding complex human emotional states. This study aims to enhance the predictive accuracy of emotional state classification by applying Long Short-Term Memory (LSTM) networks to analyze EEG signals. Using the DEAP dataset, which contains multi-channel EEG recordings, the study leverages LSTM networks’ ability to handle temporal dependencies within EEG data. The results demonstrate significant improvements in emotion recognition, achieving accuracies…

Read More

Authors: Zhonghang Li、Long Xia、Lei Shi、Yong Xu、Dawei Yin、Chao Huang Paper: https://arxiv.org/abs/2408.10269 Introduction Urban transportation systems are the backbone of modern cities, facilitating the movement of people and goods. Accurate traffic forecasting is essential for effective urban planning and transportation management, enabling efficient resource allocation and enhanced travel experiences. However, existing traffic prediction models often struggle with generalization, particularly in zero-shot prediction scenarios for unseen regions and cities, and long-term forecasting. This is due to the inherent challenges in handling the spatial and temporal heterogeneity of traffic data and significant distribution shifts across time and space. In this study, we introduce OpenCity,…

Read More

Authors: Yanjie Dong、Xiaoyi Fan、Fangxin Wang、Chengming Li、Victor C. M. Leung、Xiping Hu Paper: https://arxiv.org/abs/2408.10691 Introduction Since the introduction of GPT-2 in 2019, large language models (LLMs) have evolved from specialized tools to versatile foundation models. These models exhibit impressive zero-shot capabilities, enabling them to perform tasks such as text generation, machine translation, and question answering without specific training for those tasks. However, fine-tuning these models on local datasets and deploying them efficiently remains a significant challenge due to their substantial computational and storage requirements. The traditional fine-tuning techniques using first-order optimizers demand substantial GPU memory, often exceeding the capacity of mainstream hardware.…

Read More

Authors: Chao Xu、Ang Li、Linghao Chen、Yulin Liu、Ruoxi Shi、Hao Su、Minghua Liu Paper: https://arxiv.org/abs/2408.10195 Introduction 3D object reconstruction is a critical task with applications in various fields such as augmented reality, virtual reality, and robotics. Traditional methods often require dense view inputs, which are not always feasible in practical scenarios. Recent advancements in single-image-to-3D methods have shown promise but often lack controllability and produce hallucinated regions that may not align with user expectations. This paper introduces SpaRP, a novel method designed to reconstruct 3D textured meshes and estimate camera poses from sparse, unposed 2D images. SpaRP leverages 2D diffusion models to infer 3D…

Read More

Authors: Yash Bhalgat、Vadim Tschernezki、Iro Laina、João F. Henriques、Andrea Vedaldi、Andrew Zisserman Paper: https://arxiv.org/abs/2408.09860 Introduction Egocentric videos, which capture the world from a first-person perspective, are gaining significant attention in computer vision due to their applications in augmented reality, robotics, and more. However, these videos present unique challenges for 3D scene understanding, including rapid camera motion, frequent object occlusions, and limited object visibility. Traditional 2D video object segmentation (VOS) methods struggle with these challenges, often resulting in fragmented and incomplete object tracks. This paper introduces a novel approach to instance segmentation and tracking in egocentric videos that leverages 3D awareness to overcome these…

Read More

Authors: Eashan Adhikarla、Kai Zhang、John Nicholson、Brian D. Davison Paper: https://arxiv.org/abs/2408.09650 Introduction Low-light image enhancement is a critical task in computer vision, with applications ranging from consumer gadgets like phone cameras to sophisticated surveillance systems. Traditional techniques often struggle to balance processing speed and high-quality results, especially with high-resolution images. This leads to issues like noise and color distortion in scenarios requiring quick processing, such as mobile photography and real-time video streaming. Recent advancements in foundation models, such as transformers and diffusion models, have shown promise in various domains, including low-light image enhancement. However, these models are often limited by their computational…

Read More

Authors: Pengfei Cai、Yan Song、Kang Li、Haoyu Song、Ian McLoughlin Paper: https://arxiv.org/abs/2408.08673 Introduction Sound event detection (SED) aims to identify not only the types of events occurring in an audio signal but also their temporal locations. This technology has garnered significant interest due to its applications in smart homes, smart cities, and surveillance systems. Traditional SED systems often rely on a combination of convolutional neural networks (CNNs) for feature extraction and recurrent neural networks (RNNs) for modeling temporal dependencies. However, the scarcity of labeled data poses a significant challenge for these systems. Recent advancements have seen the rise of Transformer-based SED models, inspired…

Read More

Authors: Vibhor Agarwal、Yulong Pei、Salwa Alamir、Xiaomo Liu Paper: https://arxiv.org/abs/2408.08333 Introduction Large Language Models (LLMs) have demonstrated significant capabilities in natural language generation and program generation. However, these models are prone to generating hallucinations—text that sounds plausible but is incorrect. This phenomenon is not limited to natural language but extends to code generation as well. The generated code can contain syntactical or logical errors, security vulnerabilities, memory leaks, and other issues. Given the increasing adoption of LLMs in code generation, it is crucial to investigate these hallucinations. This paper introduces the concept of code hallucinations, provides a comprehensive taxonomy of hallucination types,…

Read More