Subscribe to Updates
Subscribe to get the latest content in real time.
Author: Violet Hughes
Authors: Yupeng Su、Ziyi Guan、Xiaoqun Liu、Tianlai Jin、Dongkuan Wu、Graziano Chesi、Ngai Wong、Hao Yu Paper: https://arxiv.org/abs/2408.10631 Introduction Large language models (LLMs) have revolutionized natural language processing (NLP) with their impressive performance across various tasks. However, the increasing size and complexity of these models pose significant challenges in terms of computational and storage demands. For instance, models like GPT-175B, with 175 billion parameters, require vast resources, making them impractical for many applications. Efficient model compression strategies, such as quantization and pruning, are crucial for deploying these powerful models in practical scenarios. Traditional pruning methods, such as magnitude-based pruning, directly trim weights based on their absolute…
Authors: Junming Wang、Dong Huang、Xiuxian Guan、Zekai Sun、Tianxiang Shen、Fangming Liu、Heming Cui Paper: https://arxiv.org/abs/2408.10618 Introduction Air-ground robots (AGRs) have become increasingly significant in applications such as surveillance and disaster response due to their dual capabilities of flying and driving. However, navigating these robots in dynamic environments, such as crowded areas, poses significant challenges. Traditional navigation systems, which rely on 3D semantic occupancy networks and Euclidean Signed Distance Field (ESDF) maps, struggle with low prediction accuracy and high computational overhead in such scenarios. To address these challenges, the paper introduces OMEGA, a novel navigation system for AGRs that integrates OccMamba, a 3D semantic occupancy…
Authors: Xiao Wang、Chao wang、Shiao Wang、Xixi Wang、Zhicheng Zhao、Lin Zhu、Bo Jiang Paper: https://arxiv.org/abs/2408.10487 Introduction Event camera-based Visual Object Tracking (VOT) has garnered significant attention due to its unique imaging principles and advantages, such as low energy consumption, high dynamic range, and dense temporal resolution. Traditional event-based tracking algorithms are facing performance bottlenecks due to the reliance on vision Transformers and static templates for target localization. This paper introduces MambaEVT, a novel visual tracking framework that leverages the state space model with linear complexity as its backbone network. The framework integrates a dynamic template update strategy using the Memory Mamba network, aiming to…
Authors: Zhenyu Lu、Lakshay Sethi Paper: https://arxiv.org/abs/2408.10383 Introduction Audio-image matching is a complex task that involves associating spoken descriptions with corresponding images. Unlike text-image matching, audio-image matching is less explored due to the intricacies of modeling audio and the limited availability of paired audio-image data. Speech carries rich information such as tone, timbre, stress patterns, and contextual cues, which can vary significantly among speakers and languages. This complexity, however, also means that speech can convey more information than text alone. Recent advancements in multi-modal models have shown significant benefits for retrieval tasks. However, existing audio-image models have not achieved the same…
Authors: Richard H. Moulton、Gary A. McCully、John D. Hastings Paper: https://arxiv.org/abs/2405.18753 Introduction In the rapidly evolving field of cybersecurity, the reliability and integrity of AI-driven research are paramount. As AI systems are increasingly deployed to protect critical infrastructure, analyze network traffic, and detect advanced persistent threats (APTs), a significant challenge has emerged: the reproducibility crisis. This crisis, where many studies’ results cannot be reliably reproduced or replicated, threatens the foundation of scientific progress and the practical deployment of robust models in real-world applications. Adversarial robustness, the study of ensuring that deep neural networks (DNNs) maintain functionality in the face of intentional…
Authors: Vishal S. Ngairangbam、Michael Spannowsky Paper: https://arxiv.org/abs/2408.08823 Introduction Equivariant neural networks, which leverage symmetries inherent in data, have shown significant promise in enhancing classification accuracy, data efficiency, and convergence speed. However, selecting an appropriate group and defining its actions across the various layers of the network remains a complex task, particularly for applications requiring adherence to specific symmetries. This research aims to establish a foundational framework for designing equivariant neural network architectures by utilizing stabilizer groups. In the context of equivariant function approximation, a critical insight is that the preimage of a target element in the output space can be…
Authors: Carlos Toxtli、Christopher Curtis、Saiph Savage Paper: https://arxiv.org/abs/2408.07838 Introduction Crowdworkers play a crucial role in enhancing AI services, yet they often face poor working conditions, especially those from non-US/European backgrounds. This disparity arises from the assumption that crowdworkers are a homogeneous group, leading to standardized interfaces that neglect cultural diversity. This paper proposes creating culturally-aware workplace tools, specifically designed to adapt to monochronic and polychronic work styles. The proposed tool, “CultureFit,” aims to improve the well-being and productivity of crowdworkers by integrating cultural dimensions into its design. Related Work Universal Design Universal design aims to create products that provide equivalent experiences…
Authors: Shunyu Yao、Mitchy Lee Paper: https://arxiv.org/abs/2408.07945 Introduction The Rubik’s Cube, a 3 × 3 × 3 single-player combination puzzle, has garnered significant interest within the reinforcement learning community. Despite its seemingly simple structure, the Rubik’s Cube presents a complex challenge due to its vast state space, consisting of approximately 4.325 × 10^19 possible states, and a small, unconstrained action space with only twelve possible actions. This complexity makes it difficult to find the shortest solution to a scrambled Rubik’s Cube using limited computational resources. Previous research has introduced various methods to address this challenge. One notable method is DeepCubeA, which…
Authors: Lukas Strack、Mahmoud Safari、Frank Hutter Paper: https://arxiv.org/abs/2408.06820 Introduction Activation functions are a critical component of deep neural networks, influencing both training dynamics and final performance. While the Rectified Linear Unit (ReLU) is widely used due to its simplicity and effectiveness, other activation functions have been proposed to address specific issues like the dying ReLU problem. However, manually designing optimal activation functions for specific tasks remains challenging. This paper leverages recent advancements in gradient-based search techniques to efficiently identify high-performing activation functions tailored to specific applications. Related Work Previous research has explored automated activation function design using gradient descent and black-box…