Author: Lincoln Bennett

Authors: Róbert Csordás、Christopher Potts、Christopher D. Manning、Atticus Geiger Paper: https://arxiv.org/abs/2408.10920 Introduction The Linear Representation Hypothesis (LRH) posits that neural networks encode concepts as linear directions in their activation space. This hypothesis has been a cornerstone in understanding neural network interpretability. However, the paper “Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations” by Róbert Csordás, Christopher Potts, Christopher D. Manning, and Atticus Geiger challenges this strong interpretation of the LRH. The authors present a counterexample demonstrating that gated recurrent neural networks (RNNs) can represent tokens using magnitudes rather than directions, leading to non-linear, layered representations termed “onion representations.”…

Read More

Authors: Yuxiao Wang、Qiwei Xiong、Yu Lei、Weiying Xue、Qi Liu、Zhenao Wei Paper: https://arxiv.org/abs/2408.10641 Introduction With the rapid increase in image data, understanding and analyzing the content within these images has become a significant challenge. Human-object interaction (HOI) detection has emerged as a crucial technology in computer vision, aiming to accurately locate humans and objects in images or videos and recognize the corresponding interaction categories to better understand human activities. Specifically, HOI detection outputs a series of triplets () from an image or video. This technology is widely used in various applications such as autonomous driving, action recognition, human-computer interaction, social network analysis, emotion…

Read More

Authors: Ananya Pandey、Dinesh Kumar Vishwakarma Paper: https://arxiv.org/abs/2408.10248 Introduction In the digital age, social media platforms like Twitter, Instagram, and Facebook are inundated with multimodal content, combining text and images. Analyzing this content to understand public sentiment towards specific topics is crucial for various applications, from marketing to public opinion analysis. Traditional Aspect-Based Sentiment Analysis (ABSA) focuses on determining sentiment polarity towards specific attributes within text. However, with the rise of multimodal content, relying solely on text is insufficient. This study introduces a novel approach called the Visual-to-Emotional-Caption Translation Network (VECTN) to enhance Target-Dependent Multimodal Sentiment Analysis (TDMSA) by incorporating visual…

Read More

Authors: Delma Nieves-Rivera、Christopher Archibald Paper: https://arxiv.org/abs/2408.10512 Introduction In many real-world continuous action domains, human agents must decide which actions to attempt and then execute those actions to the best of their ability. However, humans cannot execute actions without error. Human performance in these domains can potentially be improved by the use of AI to aid in decision-making. One requirement for an AI to correctly reason about what actions a human agent should attempt is a correct model of that human’s execution error, or skill. Recent work has demonstrated successful techniques for estimating this execution error with various types of agents…

Read More

Authors: Xiaoyu Kong、Jiancan Wu、An Zhang、Leheng Sheng、Hui Lin、Xiang Wang、Xiangnan He Paper: https://arxiv.org/abs/2408.10159 Introduction Sequential recommendation systems aim to predict a user’s next item of interest by analyzing their past interactions, tailoring recommendations to individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches have applied LLMs to sequential recommendation through language generation paradigms. These methods convert user behavior sequences into prompts for LLM fine-tuning, utilizing Low-Rank Adaptation (LoRA) modules to refine recommendations. However, the uniform application of LoRA across diverse user behaviors sometimes fails to capture individual variability, leading to suboptimal performance and negative…

Read More

Authors: Silvia Seidlitz、Katharina Hölzl、Ayca von Garrel、Jan Sellner、Stephan Katzenschlager、Tobias Hölle、Dania Fischer、Maik von der Forst、Felix C.F. Schmitt、Markus A. Weigand、Lena Maier-Hein、Maximilian Dietrich Paper: https://arxiv.org/abs/2408.09873 New Spectral Imaging Biomarkers for Sepsis and Mortality in Intensive Care Introduction Sepsis is a life-threatening condition characterized by organ dysfunction due to a dysregulated response to infection. It remains a leading cause of mortality worldwide, accounting for nearly 19.7% of global deaths in 2017. Early identification of sepsis and high-risk patients is crucial for improving outcomes, but current diagnostic methods often identify sepsis only at advanced stages, leading to delays in treatment and increased mortality. This study explores…

Read More

Authors: Qifei Li、Yingming Gao、Yuhua Wen、Cong Wang、Ya Li Paper: https://arxiv.org/abs/2408.09438 Introduction Emotion recognition is a crucial aspect of human-computer interaction (HCI), significantly enhancing the interaction experience by accurately interpreting human emotions. Multimodal emotion recognition (MER) leverages various data modalities, such as audio, video, and text, to improve recognition performance. However, the fusion of inter-modal information presents significant challenges, including the need for improved feature representation, effective model structures, and robust fusion methods to handle missing modal information. To address these challenges, the paper “Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition” introduces a novel MER framework called…

Read More

Authors: Yunxiao Shi、Wujiang Wu、Mingyu Jin、Haimin Zhang、Qiang Wu、Yongfeng Zhang、Min Xu Paper: https://arxiv.org/abs/2408.08713 Introduction In the dynamic field of digital advertising and recommendation systems, Click-Through Rate (CTR) prediction is crucial for optimizing user engagement and revenue. Traditional methods for CTR prediction often struggle with modeling high-order feature interactions due to computational costs and the need for predefined interaction orders. To address these challenges, the paper introduces the Kolmogorov-Arnold Represented Sparse Efficient Interaction Network (KarSein), which aims to enhance predictive accuracy and computational efficiency. Preliminaries Problem Formulation for CTR CTR prediction involves estimating the probability that a user will click on an item.…

Read More

Authors: Zeyu Gao、Hao Wang、Yuanda Wang、Chao Zhang Paper: https://arxiv.org/abs/2408.06385 Introduction Assembly code search is a crucial task for reverse engineers, enabling them to quickly locate specific functions within extensive binary files using natural language queries. Traditional methods, such as searching for unique strings or constants, are often inefficient and time-consuming. This paper introduces a novel approach using a Large Language Model (LLM) to emulate a general compiler, termed as Virtual Compiler (ViC), to facilitate assembly code search. Background and Related Works Assembly Code Analysis Compilation transforms high-level source code into assembly code, which is directly executable by a CPU. This process…

Read More

Authors: Yuankun Xie、Xiaopeng Wang、Zhiyong Wang、Ruibo Fu、Zhengqi Wen、Haonan Cheng、Long Ye Paper: https://arxiv.org/abs/2408.06922 Introduction The rapid advancement of text-to-speech (TTS) and voice conversion (VC) technologies has led to a significant increase in deepfake speech, making it challenging for humans to discern real from fake. The ASVspoof challenge series aims to foster the development of countermeasures (CMs) to discriminate between genuine and spoofed speech utterances. The fifth edition, ASVspoof5, focuses on deepfake detection and is divided into two tracks: the deepfake detection track and the SASV task. This paper addresses the problem of open-domain audio deepfake detection in ASVspoof5 Track 1 open condition,…

Read More