Author: Asher Collins

Authors: Eito Ikuta、Yohan Lee、Akihiro Iohara、Yu Saito、Toshiyuki Tanaka Paper: https://arxiv.org/abs/2408.10846 Introduction In the realm of computer vision, image harmonization is a critical task that involves seamlessly integrating a foreground object from one image into the background of another to produce a cohesive composite. Traditional methods have primarily focused on color and illumination adjustments to achieve visual harmony. However, the selective transfer of geometrical features such as holes, cracks, droplets, and dents from one material to another, independently of material-specific surface texture, remains a complex challenge. This study introduces “Harmonizing Attention,” a novel training-free approach leveraging diffusion models for texture-aware geometry transfer.…

Read More

Authors: Karl El Hajal、Ajinkya Kulkarni、Enno Hermann、Mathew Magimai.-Doss Paper: https://arxiv.org/abs/2408.10771 Introduction In recent years, neural text-to-speech (TTS) synthesis has made significant strides, achieving a level of naturalness that closely mimics human speech. This progress has enabled a wide range of expressive outputs. However, the development of zero-shot multi-speaker TTS systems, which can synthesize speech in an unseen speaker’s voice based on short reference samples, remains a challenging task. Traditional approaches often require extensive transcribed speech datasets from numerous speakers and complex training pipelines. Self-supervised learning (SSL) speech features have emerged as effective intermediate representations for TTS, enabling straightforward and robust voice…

Read More

Authors: D Alqattan、R Sun、H Liang、G Nicosia、V Snasel、R Ranjan、V Ojha Paper: https://arxiv.org/abs/2408.10752 Introduction Federated Learning (FL) has emerged as a promising solution to the challenges posed by Centralized Machine Learning (CML), such as data storage, computation, and privacy concerns. FL enables collaborative training of a global model across numerous clients while preserving data decentralization. However, traditional FL, which employs a two-level node design, faces limitations in terms of latency, network efficiency, and server capacity. Hierarchical Federated Learning (HFL) addresses these challenges by employing multiple aggregator servers at edge and cloud levels, forming a hierarchical structure that enhances scalability and reduces latency.…

Read More

Authors: Baekryun Seong、Jieung Kim、Sang-Ki Ko Paper: https://arxiv.org/abs/2408.10900 Introduction Artificial Intelligence (AI) research has recently been dominated by large language models (LLMs), which have demonstrated significant improvements in performance with increased parameters. However, this scaling comes with a substantial increase in power consumption, leading to environmental concerns such as carbon emissions and climate change. Spiking Neural Networks (SNNs) offer a promising alternative due to their event-driven nature, mimicking the human brain and significantly reducing power consumption compared to traditional artificial neural networks (ANNs). Despite their potential, SNNs face challenges in terms of reliability and robustness, particularly against adversarial attacks. Current methods…

Read More

Authors: Arya Hadizadeh Moghaddam、Mohsen Nayebi Kerdabadi、Cuncong Zhong、Zijun Yao Paper: https://arxiv.org/abs/2408.09635 Introduction Background Lung cancer remains one of the leading causes of death worldwide. According to the CDC, in 2020, there were 47 new cases of lung cancer and 32 related deaths per 100,000 individuals in the United States. Early detection is crucial for improving survival rates, and DNA microarray technology has emerged as a powerful tool for this purpose. DNA microarrays can measure the activity of tens of thousands of genes simultaneously, providing valuable insights into the aberrant gene expression profiles of cancer cells. Problem Statement Despite the potential of…

Read More

Authors: Siqi Ouyang、Xi Xu、Chinmay Dandekar、Lei Li Paper: https://arxiv.org/abs/2408.09430 Introduction Simultaneous speech translation (SST) is a challenging task that involves translating streaming speech input into text in another language in real-time. This technology is crucial for applications such as multilingual conferences and live streaming. Traditional SST methods often struggle with high latency due to the need for recomputation of input representations or fall short in translation quality compared to offline speech translation (ST) models. In this context, the paper introduces FASST, a novel method leveraging large language models (LLMs) to achieve efficient and high-quality simultaneous speech translation. FASST employs blockwise-causal speech…

Read More

Authors: Yankai Chen、Yixiang Fang、Yifei Zhang、Chenhao Ma、Yang Hong、Irwin King Paper: https://arxiv.org/abs/2408.09239 Introduction Bipartite graphs are a fundamental structure in various real-world applications, such as recommendation systems, database retrieval, and document querying. These graphs consist of two disjoint sets of nodes, with edges only between nodes of different sets. The task of Top-N search in bipartite graphs involves selecting the best-matched nodes for a given query node, which is crucial for effective information filtering. Traditional approaches rely on similarity matching in continuous Euclidean space using vectorized node embeddings. However, these methods face challenges in terms of computation latency and memory overhead, especially…

Read More

Authors: Rishabh Agrawal、Nathan Dahlin、Rahul Jain、Ashutosh Nayyar Paper: https://arxiv.org/abs/2408.09125 Introduction Imitation learning (IL) has emerged as a powerful tool for robotic tasks where direct programming or defining optimal control costs is challenging. Traditional IL methods often rely on environmental interactions during learning, supplementary datasets, or knowledge of transition dynamics. However, these requirements are not always feasible in real-world scenarios, such as autonomous driving or healthcare, where safety and cost concerns limit direct interactions. This study introduces a novel approach to IL that operates under strictly batch conditions, leveraging the Markov balance equation and conditional density estimation to improve performance without additional…

Read More

Authors: Yang Nan、Huichi Zhou、Xiaodan Xing、Guang Yang Paper: https://arxiv.org/abs/2408.08704 Beyond the Hype: A Dispassionate Look at Vision-Language Models in Medical Scenarios Introduction Recent advancements in Large Vision-Language Models (LVLMs) have showcased their impressive capabilities across various tasks. However, their performance and reliability in specialized domains such as medicine remain underexplored. This study introduces RadVUQA, a novel Radiological Visual Understanding and Question Answering benchmark, to comprehensively evaluate existing LVLMs. RadVUQA assesses LVLMs across five dimensions: anatomical understanding, multimodal comprehension, quantitative and spatial reasoning, physiological knowledge, and robustness. The findings reveal significant deficiencies in both generalized and medical-specific LVLMs, highlighting the need for…

Read More

Authors: Guanchu Wang、Junhao Ran、Ruixiang Tang、Chia-Yuan Chang、Chia-Yuan Chang、Yu-Neng Chuang、Zirui Liu、Vladimir Braverman、Zhandong Liu、Xia Hu Paper: https://arxiv.org/abs/2408.08422 Introduction Large Language Models (LLMs) have shown remarkable capabilities in various domains, including medical research. However, their performance in diagnosing rare diseases remains uncertain. Rare diseases, despite affecting a small portion of the population, collectively impose significant public health burdens. Diagnosing these conditions is challenging due to their complex genetic origins and unpredictable clinical manifestations. This study aims to assess the diagnostic performance of LLMs in rare diseases and explore methods to enhance their effectiveness. Preliminaries Large Language Models for Rare Disease Diagnosis LLMs have demonstrated…

Read More