Subscribe to Updates
Subscribe to get the latest content in real time.
Author: Noah Davis
Authors: Tanmana Sadhu、Ali Pesaranghader、Yanan Chen、Dong Hoon Yi Paper: https://arxiv.org/abs/2408.11021 Introduction The rapid advancements in large language models (LLMs) have enabled the development of autonomous agents capable of performing complex tasks and making decisions with a high degree of autonomy. These agents can understand high-level instructions, interact with their environments, and utilize various tools to execute tasks. However, as their capabilities expand, ensuring their safety and trustworthiness becomes increasingly critical. This study introduces the ATHENA framework, which leverages verbal contrastive learning to guide autonomous agents towards safer interactions while fulfilling tasks. The framework also incorporates a critiquing mechanism to prevent risky…
Authors: Yonggan Wu、Ling-Chao Meng、Yuan Zichao、Sixian Chan、Hong-Qiang Wang Paper: https://arxiv.org/abs/2408.10624 Introduction Person re-identification (ReID) is a critical task in surveillance systems, aiming to match images of individuals captured by different cameras. Traditional visible-light person ReID methods have achieved significant success, but they struggle under poor lighting conditions. To address this, infrared cameras are increasingly used, leading to the development of visible-infrared person re-identification (VI-ReID). VI-ReID is challenging due to the significant modality discrepancies between visible and infrared images. Existing methods often fail to fully mine modality-invariant information, focusing on either spatial or channel dimensions but not both. This study introduces the…
Authors: Ritwik Mishra、Sreeram Vennam、Rajiv Ratn Shah、Ponnurangam Kumaraguru Paper: https://arxiv.org/abs/2408.10604 Introduction In the realm of Question Answering (QA) systems, most existing datasets focus on factoid-based short-context questions, predominantly in high-resource languages. However, there is a significant gap when it comes to non-factoid questions, especially in low-resource languages. This study introduces MuNfQuAD, a multilingual QA dataset designed to address this gap by focusing on non-factoid questions. The dataset leverages interrogative subheadings from BBC news articles as questions and the corresponding paragraphs as silver answers, encompassing over 370K QA pairs across 38 languages. Related Work Existing QA Datasets Several QA datasets have been…
Authors: Hongyin Zhu Paper: https://arxiv.org/abs/2408.09416 Abstract This paper, authored by Hongyin Zhu, delves into the multifaceted challenges and responses associated with the practice of large language models (LLMs). It spans various dimensions including industry trends, academic research, technological innovation, and business applications. The paper systematically categorizes these challenges and responses into five core dimensions: computing power infrastructure, software architecture, data resources, application scenarios, and brain science. The aim is to provide a comprehensive AI knowledge framework to stimulate innovative thinking and promote industrial progress. 1. Computing Power Infrastructure Cloud-Edge-End Collaborative Architecture The cloud-edge-end collaborative architecture is a distributed system designed…
Authors: Vladimir Araujo、Marie-Francine Moens、Tinne Tuytelaars Paper: https://arxiv.org/abs/2408.09053 Introduction Continual Learning (CL) aims to enable models to learn new tasks sequentially without forgetting previously acquired knowledge. This is particularly challenging in Natural Language Processing (NLP) due to the diverse and evolving nature of language tasks. Recent advancements in Pre-trained Language Models (PLMs) and Parameter-Efficient Fine-Tuning (PEFT) methods have shown promise in addressing these challenges. PEFT methods, such as prompts or adapters, allow for efficient fine-tuning of PLMs for specific tasks while keeping the main model frozen. However, these methods face significant limitations, including interference between modules and suboptimal routing during module…
Authors: Sang-Hoon Lee、Ha-Yeong Choi、Seong-Whan Lee Paper: https://arxiv.org/abs/2408.08019 Introduction The paper “Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization” introduces PeriodWave-Turbo, a high-fidelity and efficient waveform generation model. This model leverages adversarial flow matching optimization to enhance the performance of pre-trained Conditional Flow Matching (CFM) generative models. The primary goal is to address the limitations of existing models, such as the need for numerous Ordinary Differential Equation (ODE) steps and the lack of high-frequency information in generated samples. Related Works Accelerating Methods for Few-Step Generator Diffusion-based generative models have shown impressive performance but suffer from slow inference speeds due to…
Authors: Junlin Guo、Siqi Lu、Can Cui、Ruining Deng、Tianyuan Yao、Zhewen Tao、Yizhe Lin、Marilyn Lionts、Quan Liu、Juming Xiong、Catie Chang、Mitchell Wilkes、Mengmeng Yin、Haichun Yang、Yuankai Huo Paper: https://arxiv.org/abs/2408.06381 Introduction Cell nuclei instance segmentation is a fundamental task in digital pathology, particularly for accurate disease diagnosis and treatment planning. However, the generalizability of current methodologies to handle diverse and large-scale datasets remains a significant challenge. This study evaluates the performance of state-of-the-art (SOTA) cell nuclei foundation models in kidney pathology, focusing on three widely used models: Cellpose, StarDist, and CellViT. Methods Diverse Large-scale Dataset A diverse evaluation dataset was created, consisting of 2,542 kidney whole slide images (WSIs) from both…