Author: Mason King

scholar

EdgeNAT: Transformer for Efficient Edge Detection

By Mason KingAugust 26, 20240

Authors: Jinghuai Jie、Yan Guo、Guixing Wu、Junmin Wu、Baojian Hua Paper: https://arxiv.org/abs/2408.10527 Introduction Edge detection is a fundamental task in computer vision, crucial for various applications such as object recognition, image segmentation, and scene understanding. Traditional methods primarily rely on local features like color and texture variations, while more recent deep learning approaches leverage convolutional neural networks (CNNs) to capture global and semantic features. However, CNNs often struggle to preserve intricate local details. This paper introduces EdgeNAT, a one-stage transformer-based edge detector that utilizes the Dilated Neighborhood Attention Transformer (DiNAT) as its encoder. EdgeNAT aims to efficiently and accurately extract object boundaries and…

scholar

Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba

By Mason KingAugust 26, 20240

Authors: Wall Kim Paper: https://arxiv.org/abs/2408.10517 Introduction Background Offline reinforcement learning (RL) has been a significant area of research due to its potential to learn optimal policies from pre-collected datasets without additional environment interactions. This is particularly crucial in scenarios where interactions are costly or risky. Return-Conditioned Transformer Decision Models (RCTDM) have shown promise in enhancing transformer performance in offline RL by using returns-to-go instead of rewards in the input sequence. However, RCTDM faces challenges in learning optimal policies from limited suboptimal trajectories. Problem Statement The primary challenges with using transformers as decision models in offline RL are: 1. Handling Trajectory…

scholar

LeCov: Multi-level Testing Criteria for Large Language Models

By Mason KingAugust 26, 20240

Authors: Xuan Xie、Jiayang Song、Yuheng Huang、Da Song、Fuyuan Zhang、Felix Juefei-Xu、Lei Ma Paper: https://arxiv.org/abs/2408.10474 Introduction Large Language Models (LLMs) have revolutionized various domains, including natural language processing, code generation, and robotic system control. Despite their impressive capabilities, concerns about their trustworthiness persist, particularly regarding issues like hallucination and toxicity. Recent research has focused on developing testing methods to uncover these untrustworthy behaviors before deployment. However, a systematic and formalized approach to measure the sufficiency and coverage of LLM testing is still lacking. To address this gap, the authors propose LeCov, a set of multi-level testing criteria for LLMs, which considers three crucial internal…

scholar

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

By Mason KingAugust 26, 20240

Authors: Vijul Shah、Brian B. Moser、Ko Watanabe、Andreas Dengel Paper: https://arxiv.org/abs/2408.10397 Introduction The ability to accurately measure pupil diameter is crucial for assessing various psychological and physiological states, such as stress levels and cognitive load. However, the low resolution of images in many eye-tracking datasets often hampers precise measurement. This study investigates the impact of various upscaling methods on pupil diameter predictions from webcam images. By comparing several pre-trained super-resolution (SR) methods, the study aims to determine how upscaling can enhance the accuracy of pupil diameter prediction models. Related Work Super-Resolution as Pre-Processing Image super-resolution (SR) is the process of converting low-resolution…

scholar

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

By Mason KingAugust 23, 20240

Authors: Zeyuan Chen、Haiyan Wu、Kaixin Wu、Wei Chen、Mingjie Zhong、Jia Xu、Zhongyi Liu、Wei Zhang Paper: https://arxiv.org/abs/2408.09439 In the ever-evolving landscape of search engines, relevance modeling plays a pivotal role in enhancing user experience by accurately identifying items that align with users’ queries. Traditional models often fall short by relying solely on semantic congruence, which is insufficient for capturing the full spectrum of relevance. This blog delves into a novel approach that leverages user interactions and advanced prompting techniques to boost relevance modeling driven by Large Language Models (LLMs). Introduction Background Search engines are indispensable tools for navigating the vast expanse of online content. The…

scholar

Language Models Show Stable Value Orientations Across Diverse Role-Plays

By Mason KingAugust 23, 20240

Authors: Bruce W. Lee、Yeongheon Lee、Hyunsoo Cho Paper: https://arxiv.org/abs/2408.09049 Introduction Recent advancements in large language models (LLMs) have significantly enhanced their capabilities, making them integral to various real-world applications. However, a primary concern is their non-deterministic nature, which allows them to generate diverse responses to the same input. This variability stems from the vast and heterogeneous datasets they consume, enabling them to capture complex probability distributions for a single topic and encompass multiple viewpoints. Despite this flexibility, LLMs exhibit a tendency towards specific phrasings, tones, or content types, indicating a central tendency within their outputs. To systematically explore this phenomenon, the…

scholar

RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

By Mason KingAugust 23, 20240

Authors: Meng Chen、Kai Zhang、Zhenying He、Yinan Jing、X. Sean Wang Paper: https://arxiv.org/abs/2408.08933 Introduction Approximate Nearest Neighbor Search (ANNS) is a critical component in various applications, such as recommendation systems and large language model-based applications. With the rise of multimodal neural models, cross-modal ANNS has become essential for retrieving similar items across different modalities (e.g., using text to find similar images). However, existing ANNS approaches struggle with cross-modal queries due to the inherent distribution gap between embeddings from different modalities. This paper introduces RoarGraph, a projected bipartite graph designed to address these inefficiencies and significantly improve cross-modal ANNS performance. Related Work Background on…

scholar

ARMADA: Attribute-Based Multimodal Data Augmentation

By Mason KingAugust 23, 20240

Authors: Xiaomeng Jin、Jeonghwan Kim、Yu Zhou、Kuan-Hao Huang、Te-Lin Wu、Nanyun Peng、Heng Ji Paper: https://arxiv.org/abs/2408.10086 Introduction Multimodal Language Models (MLMs) have shown remarkable capabilities in understanding and integrating various modalities, including text, images, and videos. However, the process of manually annotating high-quality image-text pair data for fine-tuning and alignment is both costly and time-consuming. Existing multimodal data augmentation frameworks often face challenges such as semantic inconsistency between texts and images or the generation of unrealistic images, leading to a knowledge gap with real-world examples. To address these issues, the authors propose ARMADA (Attribute-Based Multimodal Data Augmentation), a novel method that leverages knowledge-guided manipulation of…

scholar

Generative Dataset Distillation Based on Diffusion Model

By Mason KingAugust 22, 20240

Authors: Duo Su、Junjie Hou、Guang Li、Ren Togo、Rui Song、Takahiro Ogawa、Miki Haseyama Paper: https://arxiv.org/abs/2408.08610 Introduction In this blog post, we delve into the paper titled “Generative Dataset Distillation Based on Diffusion Model,” which presents a novel approach to dataset distillation using the SDXL-Turbo diffusion model. This method was developed for the generative track of The First Dataset Distillation Challenge at ECCV 2024. The authors, Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, and Miki Haseyama, propose a technique that leverages the high-speed and high-quality image generation capabilities of the SDXL-Turbo model to achieve impressive results in dataset distillation. Background…

scholar

Efficient Data-Sketches and Fine-Tuning for Early Detection of Distributional Drift in Medical Imaging

By Mason KingAugust 21, 20240

Authors: Yusen Wu、Hao Chen、Alex Pissinou Makki、Phuong Nguyen、Yelena Yesha Paper: https://arxiv.org/abs/2408.08456 Introduction Distributional drift, also known as dataset drift, in medical imaging refers to changes in data distribution over time, which can significantly affect the performance of machine learning models used for diagnostic purposes. This drift may result from various factors, including alterations in imaging equipment, differences in imaging protocols, variations in patient demographics, or updates in image preprocessing techniques. Detecting and managing drift is critical in the medical field to ensure that models remain accurate and reliable. Ignoring drift can lead to incorrect diagnoses or suboptimal treatment recommendations, thereby potentially…

scholar

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Author: Mason King

EdgeNAT: Transformer for Efficient Edge Detection

Integrating Multi-Modal Input Token Mixer Into Mamba-Based Decision Models: Decision MetaMamba

LeCov: Multi-level Testing Criteria for Large Language Models

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Language Models Show Stable Value Orientations Across Diverse Role-Plays

RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

ARMADA: Attribute-Based Multimodal Data Augmentation

Generative Dataset Distillation Based on Diffusion Model

Efficient Data-Sketches and Fine-Tuning for Early Detection of Distributional Drift in Medical Imaging

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

Author: Mason King