Author: Ryan Miller

scholar

NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

By Ryan MillerAugust 26, 20240

Authors: Zhiyong Zhang、Aniket Gupta、Huaizu Jiang、Hanumant Singh Paper: https://arxiv.org/abs/2408.10161 Introduction Optical flow estimation is a critical task in computer vision, enabling applications such as motion detection, object tracking, and video analysis. Traditional methods like Lucas-Kanade and SIFT have been surpassed by learning-based approaches, which offer higher accuracy but at the cost of increased computational demands. NeuFlow v2 aims to address this trade-off by providing a highly efficient optical flow estimation method that maintains high accuracy while significantly reducing computational costs. This paper introduces NeuFlow v2, which builds upon its predecessor, NeuFlow v1, by incorporating a lightweight backbone and a fast refinement…

scholar

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

By Ryan MillerAugust 23, 20240

Authors: Yuyang Ye、Zhi Zheng、Yishan Shen、Tianshu Wang、Hengruo Zhang、Peijun Zhu、Runlong Yu、Kai Zhang、Hui Xiong Paper: https://arxiv.org/abs/2408.09698 Introduction The rapid advancements in Large Language Models (LLMs) have significantly bolstered the capabilities of Recommendation Systems (RSs). These models have demonstrated exceptional proficiency in understanding and summarizing complex user preferences, thereby enhancing personalization and accuracy in recommendations. However, the traditional LLM-based recommendation paradigm primarily relies on textual data, which limits its effectiveness in multimodal contexts where data from images, text, and other sources are integrated. This paper introduces the Multimodal Large Language Model-enhanced Sequential Multimodal Recommendation (MLLM-MSR) model, which aims to address these challenges by leveraging…

scholar

Neural Reward Machines

By Ryan MillerAugust 22, 20240

Authors: Elena Umili、Francesco Argenziano、Roberto Capobianco Paper: https://arxiv.org/abs/2408.08677 Neural Reward Machines: A Neurosymbolic Framework for Non-Markovian Reinforcement Learning Introduction Reinforcement Learning (RL) tasks are traditionally modeled as Markovian Decision Processes (MDPs), where task feedback depends solely on the last state and action. However, many decision problems are inherently non-Markovian or temporally extended, requiring agents to consider the entire history of state-action pairs to act rationally. Current research addresses non-Markovianity by expanding the state space with features that encode the environment history and solving the augmented-state problem with established RL algorithms. This paper introduces Neural Reward Machines (NRMs), an automata-based neurosymbolic framework…

scholar

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

By Ryan MillerAugust 21, 20240

Authors: Xiner Li、Yulai Zhao、Chenyu Wang、Gabriele Scalia、Gokcen Eraslan、Surag Nair、Tommaso Biancalani、Aviv Regev、Sergey Levine、Masatoshi Uehara Paper: https://arxiv.org/abs/2408.08252 Introduction Diffusion models have emerged as powerful generative models capable of capturing the natural design spaces of various domains, including images, molecules, DNA, RNA, and protein sequences. However, the challenge lies in optimizing downstream reward functions while preserving the naturalness of these design spaces. Existing methods often require differentiable proxy models or computationally expensive fine-tuning of diffusion models. This paper introduces a novel method that addresses these challenges by proposing an iterative sampling method that integrates soft value functions into the standard inference procedure of pre-trained…

scholar

SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis

By Ryan MillerAugust 16, 20240

Authors: Saptarshi Neil Sinha、Holger Graf、Michael Weinmann Paper: https://arxiv.org/abs/2408.06975 Introduction Accurate scene representation is crucial for various applications, including architecture, automotive industries, advertisement, and design. Traditional RGB color channels often fail to capture the full spectrum of light, leading to limitations in scene reproduction. Multi-spectral scene capture and representation overcome these limitations by providing higher resolution light and reflectance spectra. This is particularly important for applications such as virtual prototyping, predictive rendering, and spectral scene understanding. In this paper, we introduce a novel cross-spectral rendering framework based on 3D Gaussian Splatting (3DGS). Our approach generates realistic and semantically meaningful splats from…

scholar

Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT’s Effectiveness with Different Settings and Inputs

By Ryan MillerAugust 16, 20240

Authors: Mike Thelwall Paper: https://arxiv.org/abs/2408.06752 Introduction Evaluating the quality of academic research is a critical yet time-consuming task, essential for national research evaluation exercises, appointments, promotions, and tenure decisions. This paper investigates whether Large Language Models (LLMs), specifically ChatGPT, can assist in this process. The study examines which inputs (full text without tables, figures, and references; title and abstract; title only) produce better quality score estimates and how different ChatGPT models and system prompts affect these scores. Research Questions The study aims to answer the following research questions: 1. What is the optimal input for ChatGPT post-publication research quality assessment:…

scholar

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Author: Ryan Miller

NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

Neural Reward Machines

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

SpectralGaussians: Semantic, spectral 3D Gaussian splatting for multi-spectral scene representation, visualization and analysis

Evaluating Research Quality with Large Language Models: An Analysis of ChatGPT’s Effectiveness with Different Settings and Inputs

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

Author: Ryan Miller