Author: Chloe Baker

Authors: Yunzhe Xu、Yiyuan Pan、Zhe Liu、Hesheng Wang Paper: https://arxiv.org/abs/2408.11051 Introduction Background Large Language Models (LLMs) have significantly advanced the field of embodied intelligence, particularly in Vision-and-Language Navigation (VLN) tasks. VLN involves navigating to a goal based on human instructions in either indoor or outdoor environments. This task requires a sophisticated understanding of instructions, environmental perception, and decision-making capabilities. While LLMs have shown promise in general conversational scenarios, their application in specialized navigation tasks has been limited and often suboptimal compared to specialized VLN models. Problem Statement The primary challenge lies in the inherent limitations of general-purpose LLMs when applied to navigation-specific…

Read More

Authors: Yuhang Zhang、Xiuqi Zheng、Chenyi Liang、Jiani Hu、Weihong Deng Paper: https://arxiv.org/abs/2408.10614 Generalizable Facial Expression Recognition: A Comprehensive Overview Facial expression recognition (FER) is a crucial aspect of human-computer interaction, enabling machines to understand human emotions. However, current state-of-the-art (SOTA) FER methods often fail when applied to test sets with domain gaps from the training set. This blog delves into a novel approach to enhance the zero-shot generalization ability of FER methods, ensuring robust performance across diverse, unseen test sets. 1. Introduction Background and Problem Statement Facial expression recognition (FER) is essential for applications in human-computer interaction, security, and healthcare. Despite significant advancements…

Read More

Authors: Michelle Han、Junyao Chen Paper: https://arxiv.org/abs/2408.10532 Introduction In recent years, the integration of artificial intelligence (AI) into various aspects of daily life has revolutionized how we interact with technology. One significant area of impact is health and nutrition, where AI-powered applications are becoming increasingly popular. Despite the surge in diet and nutrition apps, a common drawback is the manual entry of food data, which is both time-consuming and tedious. This paper introduces NutrifyAI, a comprehensive system designed to address this issue by leveraging advanced computer vision techniques and nutritional analysis APIs to provide real-time food detection, nutritional analysis, and personalized…

Read More

Authors: Rasha Alshawi、Md Meftahul Ferdaus、Mahdi Abdelguerfi、Kendall Niles、Ken Pathak、Steve Sloan Paper: https://arxiv.org/abs/2408.10181 Introduction In the realm of infrastructure maintenance, the inspection of culverts and sewer pipes is crucial for ensuring the integrity and longevity of water management systems. Traditional inspection methods, such as manual video reviews, are time-consuming and prone to human error. Automated semantic segmentation techniques offer a promising alternative by enhancing inspection accuracy and efficiency. However, the challenge of imbalanced datasets, where certain defect types are underrepresented, poses a significant hurdle. This paper introduces the Enhanced Feature Pyramid Network (E-FPN), a deep learning model designed to address these challenges…

Read More

Authors: Chris Hyunchul Jo、Jiwoong Yang、Byunghwan Jeon、Hackjoon Shim、Ikbeom Jang Paper: https://arxiv.org/abs/2408.09894 Introduction Background Rotator cuff tears are a common cause of shoulder pain and disability, often requiring surgical intervention. Traditionally, the diagnosis of rotator cuff tears relies heavily on magnetic resonance imaging (MRI) due to its high sensitivity and specificity for soft tissue injuries. However, MRI is expensive and not always readily available, leading to increased healthcare costs and delayed diagnosis. Problem Statement Initial evaluations using plain shoulder radiographs often fail to identify soft tissue injuries such as rotator cuff tears. This necessitates further imaging with more expensive MRI examinations. The…

Read More

Authors: Tianyi Liu、Zhaorui Tan、Muyin Chen、Xi Yang、Haochuan Jiang、Kaizhu Huang Paper: https://arxiv.org/abs/2408.09465 Introduction Brain tumors pose significant risks to human health, necessitating precise medical segmentation for effective treatment planning. Brain tumor segmentation typically relies on multiple magnetic resonance imaging (MRI) modalities, such as Fluid Attenuation Inversion Recovery (Flair), contrast-enhanced T1-weighted (T1ce), T1-weighted (T1), and T2-weighted (T2). These modalities complement each other, providing a comprehensive understanding of the tumor’s physical structure and physiopathology. However, in clinical practice, certain MRI modalities may be missing due to data corruption or variations in scanning protocols, presenting a challenge for accurate segmentation. To address this issue, strategies…

Read More

Authors: Binbin Ding、Penghui Yang、Zeqing Ge、Shengjun Huang Paper: https://arxiv.org/abs/2408.08655 Introduction Federated Learning (FL) is a distributed machine learning framework that allows multiple clients to collaboratively train models while preserving data privacy. However, this decentralized nature also opens up vulnerabilities, particularly to backdoor attacks. These attacks embed malicious behaviors into the model, which remain dormant under normal conditions but activate when specific triggers are present. This paper introduces a novel defense mechanism called Flipping Weight Updates of Low-Activation Input Neurons (FLAIN) to mitigate such backdoor attacks. Related Work Backdoor Attacks in FL Backdoor attacks manipulate models to make specific predictions by embedding…

Read More

Authors: Guhong Chen、Liyang Fan、Zihan Gong、Nan Xie、Zixuan Li、Ziqiang Liu、Chengming Li、Qiang Qu、Shiwen Ni、Min Yang Paper: https://arxiv.org/abs/2408.08089 Introduction Artificial intelligence (AI) technologies, particularly large language models (LLMs), are rapidly transforming the traditional legal industry. From automated text generation to interactive legal consulting, AI applications in the legal domain are becoming increasingly widespread. However, significant challenges remain in handling complex legal queries and simulating real court environments. Existing legal AI systems often struggle to comprehensively simulate the legal reasoning process and multi-party interactions. To address these limitations, the paper presents AgentCourt, an innovative LLM-based system designed for the simulation of civil courts. AgentCourt involves…

Read More

Authors: Yuqicheng Zhu、Nico Potyka、Jiarong Pan、Bo Xiong、Yunjie He、Evgeny Kharlamov、Steffen Staab Paper: https://arxiv.org/abs/2408.08248 Introduction Knowledge Graph Embeddings (KGE) are a powerful tool for mapping entities and predicates into numerical vectors, enabling non-classical reasoning capabilities by leveraging similarities and analogies between entities and relations. Typically, KGE models are evaluated through link prediction tasks, where the goal is to rank all potential answers to a query based on their plausibility scores. However, these rankings often lack a meaningful probabilistic interpretation, making it challenging to distinguish plausible from implausible answers, especially in high-stakes domains like medicine. To address this issue, the authors propose using the…

Read More

Authors: Yi Wu、Daryl Chang、Jennifer She、Zhe Zhao、Li Wei、Lukasz Heldt Paper: https://arxiv.org/abs/2408.06512 Introduction and Related Work In the realm of large video recommendation systems, the process typically involves several stages: candidate generation, multitask model scoring, ranking, and re-ranking. The primary focus of this paper is on the ranking stage, where user behavior predictions are combined to optimize long-term user satisfaction. Traditional approaches often rely on heuristic ranking functions optimized through hyperparameter tuning. However, these methods face challenges in scalability and adaptability. The authors propose a novel approach by formulating the problem as a slate optimization problem aimed at maximizing long-term user satisfaction.…

Read More