Author: Tom Johnson

Authors: Natchapon Jongwiriyanurak、Zichao Zeng、June Moh Goo、Xinglei Wang、Ilya Ilyankou、Kerkritt Srirrongvikrai、Meihui Wang、James Haworth Paper: https://arxiv.org/abs/2408.10872 Introduction Road traffic crashes are a significant global issue, causing millions of deaths annually and imposing a substantial economic burden, particularly in low- and middle-income countries (LMICs). Traditional methods for road safety assessment, such as those employed by the International Road Assessment Programme (iRAP), involve extensive manual surveys and coding, which are costly and time-consuming. This paper introduces V-RoAst (Visual question answering for Road Assessment), a novel approach leveraging Vision Language Models (VLMs) to automate road safety assessments using crowdsourced imagery. This method aims to provide a…

Read More

Authors: Peng Zhou、Yongdong Liu、Lixun Ma、Weiye Zhang、Haohan Tan、Zhenguang Liu、Butian Huang Paper: https://arxiv.org/abs/2408.10657 Introduction The increasing adoption of encryption protocols has inadvertently provided a cover for malicious activities, making it challenging to detect such threats. Power grid systems, being critical infrastructure, are particularly vulnerable to these attacks. Traditional methods for detecting malicious encrypted traffic often rely on static pre-trained models, which are not well-suited for dynamic environments like blockchain-based power grid systems. These methods struggle to adapt to new types of encrypted attacks, leading to significant performance drops. To address these challenges, the authors propose ETGuard, a novel framework designed to automatically…

Read More

Authors: Thanh Thi Nguyen、Campbell Wilson、Janis Dalins Paper: https://arxiv.org/abs/2408.10503 Introduction Biometric systems have become an integral part of modern security and identification systems. Among various biometric features, hand images are particularly valuable due to their unique and stable characteristics, such as vein patterns, fingerprints, and hand geometry. This paper explores the application of Vision Transformers (ViTs) for the classification of hand images, leveraging their advanced capabilities in image processing. The study also introduces adaptive knowledge distillation methods to enhance the performance of ViTs across different domains, addressing the challenge of catastrophic forgetting. Related Work Traditional Methods in Hand Image Classification Previous…

Read More

Authors: Khadija Iddrisu、Waseem Shariff、Noel E.OConnor、Joseph Lemley、Suzanne Little Paper: https://arxiv.org/abs/2408.10395 Introduction Face and eye tracking are pivotal tasks in computer vision with significant applications in healthcare, in-cabin monitoring, attention estimation, and human-computer interactions. Traditional frame-based cameras often struggle with issues such as under-sampling fast-moving objects, scale variations, and motion-induced shape deformations. Event cameras (ECs), also known as neuromorphic sensors, offer a promising alternative by capturing changes in local light intensity at the pixel level, producing asynchronously generated data termed “events.” This study evaluates the integration of conventional algorithms with event-based data transformed into a frame format, preserving the unique benefits of…

Read More

Authors: Florentina Voboril、Vaidyanathan Peruvemba Ramaswamy、Stefan Szeider Paper: https://arxiv.org/abs/2408.10268 Introduction Constraint programming (CP) is a powerful methodology for solving combinatorial problems by specifying constraints declaratively. Streamliners are constraints added to a constraint model to reduce the search space, thereby improving the feasibility and speed of finding solutions to complex constraint satisfaction problems. Traditionally, streamliners were crafted manually or generated through systematic testing of atomic constraints, which is a high-effort offline task. This paper introduces StreamLLM, a novel method that leverages Large Language Models (LLMs) to generate streamliners in real-time, significantly enhancing the efficiency of solving constraint satisfaction problems. Related Work Constraint…

Read More

Authors: Jiangbin Zheng、Han Zhang、Qianqing Xu、An-Ping Zeng、Stan Z. Li Paper: https://arxiv.org/abs/2408.10247 Introduction Enzymes, specialized proteins that act as biological catalysts, are pivotal in various industrial and biological processes due to their ability to expedite chemical reactions under mild conditions. Despite their significance, computational enzyme design remains in its infancy within the broader protein domain. This is primarily due to the scarcity of comprehensive enzyme data and the complexity of enzyme design tasks, which hinder systematic research and model generalization. To address these challenges, the study introduces MetaEnzyme, a unified enzyme design framework that leverages a cross-modal structure-to-sequence transformation architecture. This framework…

Read More

Authors: Poppy Collis、Ryan Singh、Paul F Kinghorn、Christopher L Buckley Paper: https://arxiv.org/abs/2408.10970 Introduction In the realm of artificial intelligence, one of the enduring challenges is the ability to flexibly learn discrete abstractions that are useful for solving inherently continuous problems. The human brain excels at distilling discrete concepts from continuous sensory data, enabling us to specify abstract sub-goals during planning and transfer this knowledge across new tasks. This capability is highly desirable in the design of autonomous systems. However, translating continuous problems into discrete space for decision-making remains a complex task. This study explores the potential of recurrent switching linear dynamical systems…

Read More

Authors: Xukun Zhou、Fengxin Li、Ziqiao Peng、Kejian Wu、Jun He、Biao Qin、Zhaoxin Fan、Hongyan Liu Paper: https://arxiv.org/abs/2408.09357 Introduction Background Audio-driven 3D talking face animation has become increasingly prevalent in various sectors, including gaming, live streaming, and animation production. These applications leverage advanced technologies such as 3D parametric models, Neural Radiance Fields, and Gaussian splatting to achieve accurate lip synchronization and facial emotions. Despite significant progress, the intricate relationship between facial expressions and accompanying audio still needs to be explored, particularly in the context of speaking style adaptation. Problem Statement Most existing approaches for audio-driven 3D face animation are designed for specific individuals with predefined speaking…

Read More

Authors: Sher Badshah、Hassan Sajjad Paper: https://arxiv.org/abs/2408.09235 Introduction In the realm of natural language processing (NLP), the evaluation of free-form text remains a challenging task. Traditional methods often rely on human evaluators to judge the quality and accuracy of text generated by language models. However, this approach is not scalable and can be subjective. The study titled “Reference-Guided Verdict: LLMs-as-Judges in Automatic Evaluation of Free-Form Text” explores an innovative solution to this problem by leveraging large language models (LLMs) as automated judges. This approach aims to provide a more scalable and objective method for evaluating free-form text. Related Work Human Evaluation…

Read More

Authors: Xiongtao Sun、Gan Liu、Zhipeng He、Hui Li、Xiaoguang Li Paper: https://arxiv.org/abs/2408.08930 Introduction The advent of Large Language Models (LLMs) such as GPT-4 has revolutionized the field of Natural Language Processing (NLP), enabling applications ranging from intelligent assistants to customized content generation. A critical component in interacting with these models is the use of prompts, which guide the models to perform specific tasks and generate desired outputs. However, the use of precise prompts can inadvertently lead to the leakage of Personally Identifiable Information (PII), posing significant privacy risks. To address this issue, the paper proposes DePrompt, a framework designed to desensitize and evaluate…

Read More