Author: Zoe Gray

Authors: Kazi Hasan Ibn Arif、JinYi Yoon、Dimitrios S. Nikolopoulos、Hans Vandierendonck、Deepu John、Bo Ji Paper: https://arxiv.org/abs/2408.10945 Introduction Vision-Language Models (VLMs) have become essential tools in multimodal tasks, leveraging both visual and textual data to enhance accuracy. However, high-resolution VLMs, which encode detailed image information, often generate an excessive number of visual tokens. This token surplus poses significant computational challenges, especially in resource-constrained environments with limited GPU capabilities. To address this issue, the paper introduces High-Resolution Early Dropping (HiRED), a token-dropping scheme designed to operate within a fixed token budget before the Large Language Model (LLM) stage. HiRED aims to maintain superior accuracy while…

Read More

Authors: Yuqing Zhao、Divya Saxena、Jiannong Cao、Xiaoyun Liu、Changlin Song Paper: https://arxiv.org/abs/2408.10566 Introduction Continual learning (CL) is a critical area in machine learning that focuses on enabling models to learn continuously from a stream of data. This is particularly important in dynamic environments such as autonomous vehicles, healthcare, and smart cities, where models need to adapt to new data while retaining previously learned knowledge. However, a significant challenge in CL is the degradation of performance on previously learned tasks when the model’s capacity is increased to accommodate new data. This phenomenon, termed growth-induced forgetting (GIFt), is especially problematic in task-agnostic CL settings where…

Read More

Authors: Xinlang Yue、Yiran Liu、Fangzhou Shi、Sihong Luo、Chen Zhong、Min Lu、Zhe Xu Paper: https://arxiv.org/abs/2408.10479 Introduction Order-dispatching, the process of assigning passenger orders to available drivers in real-time, is a critical function in ride-hailing platforms. This process significantly impacts the service experience for both drivers and passengers. There are two primary research scopes in this domain: macro-view and micro-view order-dispatching (MICOD). The macro-view focuses on long-term, city-level efficiency optimization, while MICOD deals with localized spatiotemporal scenarios characterized by high stochasticity. MICOD involves matching an unspecified number of drivers and orders within each decision window, optimizing goals such as driver income and pickup distance over…

Read More

Authors: Yuan An、Samarth Kolanupaka、Jacob An、Matthew Ma、Unnat Chhatwal、Alex Kalinowski、Michelle Rogers、Brian Smith Paper: https://arxiv.org/abs/2408.10492 Introduction Engaging students and ensuring the retention of knowledge are critical aspects of effective teaching. However, many lectures fail to achieve these goals. Despite extensive research in cognitive science and neuroscience suggesting effective teaching strategies, their application in real classrooms remains limited. The advent of artificial intelligence (AI) offers a promising avenue to bridge this gap. This paper introduces a novel knowledge graph-supported intelligent lecturing assistant (ILA) system designed to help teachers enhance student learning by integrating insights from cognitive science, neuroscience, and established pedagogical best practices. By…

Read More

Authors: Ka Hei Carrie Lau、Efe Bozkir、Hong Gao、Enkelejda Kasneci Paper: https://arxiv.org/abs/2408.09285 Introduction In recent years, the integration of digital technologies in education has revolutionized learning paradigms, making them more immersive, interactive, and personalized. Virtual Reality (VR) has demonstrated its capacity to create highly immersive environments that simulate real-life experiences or historical events. Similarly, advancements in Large Language Models (LLMs) enable the generation of human-like responses, providing interactive learning experiences. However, the combination of VR and LLMs to preserve cultural heritage remains largely unexplored. This paper aims to evaluate user perception and the educational impact of integrating these technologies in virtual environments…

Read More

Authors: Junlin Chen、Chengcheng Xu、Yangfan Xu、Jian Yang、Jun Li、Zhiping Shi Paper: https://arxiv.org/abs/2408.09220 Introduction Video action recognition has been a pivotal task in the realm of video understanding, attracting significant attention from researchers. Traditional methods often involve converting videos into three-dimensional data to capture both spatial and temporal information. These methods typically adapt image understanding models to handle the spatiotemporal nature of video data. However, this approach presents several challenges, including the need for model architecture adjustments and the high computational cost associated with processing high-dimensional data. To address these issues, the authors propose a novel video representation architecture called Flatten. This architecture…

Read More

Authors: Jiawei Zhao、Kejiang Chen、Xiaojian Yuan、Weiming Zhang Paper: https://arxiv.org/abs/2408.08924 Introduction In recent years, large language models (LLMs) such as ChatGPT, Gemini, and Llama have demonstrated exceptional performance across various natural language processing (NLP) tasks. However, these models are vulnerable to jailbreak attacks, where adversaries can induce the generation of harmful content through meticulously crafted prompts. This vulnerability poses significant challenges to the secure use and promotion of LLMs. Existing defense methods offer protection from different perspectives but often suffer from insufficient effectiveness or a significant impact on the model’s capabilities. In this paper, the authors propose a novel, plug-and-play, and easy-to-deploy…

Read More

Authors: Weijia Zhang、Chenlong Yin、Hao Liu、Hui Xiong Paper: https://arxiv.org/abs/2408.08328 Introduction Irregularly Sampled Time Series (ISTS) are prevalent in various domains such as healthcare, biology, climate science, astronomy, physics, and finance. Despite the significant advancements in Pre-trained Language Models (PLMs) like ChatGPT for natural language processing, their application to time series analysis, particularly ISTS, remains under-explored. This paper addresses this gap by investigating the potential of PLMs for ISTS analysis and proposing a unified PLM-based framework, ISTS-PLM, which integrates time-aware and variable-aware PLMs for comprehensive intra- and inter-time series modeling. Related Works Irregularly Sampled Time Series Analysis Existing research on ISTS primarily…

Read More

Authors: Federico Belotti、Fabio Dadda、Marco Cremaschi、Roberto Avogadro、Riccardo Pozzi、Matteo Palmonari Paper: https://arxiv.org/abs/2408.06423 Introduction Tables are essential tools for organizing and sharing information in various fields, including business and science. However, understanding the meaning of table contents can be challenging. Semantic Table Interpretation (STI) aims to address this by annotating tabular data to disambiguate their meaning. This involves tasks such as Cell-Entity Annotation (CEA), Column-Type Annotation (CTA), and Column-Property Annotation (CPA). The goal is to match table cells with entities from a background Knowledge Graph (KG), transforming tables into Knowledge Graphs or enriching them with additional information. Related Work Entity Disambiguation (ED) in…

Read More

Authors: Max Nelson、Shannon Wotherspoon、Francis Keith、William Hartmann、Matthew Snover Paper: https://arxiv.org/abs/2408.06484 Introduction Cross-lingual conversational speech summarization is a challenging task due to the scarcity of resources. While transcriptions exist for many languages, translated conversational speech is rare, and datasets containing summaries are non-existent. This paper builds upon the Fisher and Callhome Spanish-English Speech Translation corpus by supplementing the translations with summaries generated using GPT-4. The goal is to generate similar summaries despite transcription and translation errors. The paper presents a baseline cascade-based system using open-source speech recognition and machine translation models, tests a range of LLMs for summarization, and analyzes the impact…

Read More