Subscribe to Updates
Subscribe to get the latest content in real time.
Author: John Johnson
Authors: Antonio Rago、Maria Vanina Martinez Paper: https://arxiv.org/abs/2408.06875 Introduction As AI models become increasingly complex and integrated into daily life, the need for interactive explainable AI (XAI) methods grows. This paper proposes using belief change theory as a formal foundation for incorporating user feedback into logical representations of data-driven classifiers. This approach aims to develop interactive explanations that promote transparency, interpretability, and accountability in human-machine interactions. Related Work Belief revision has been explored in various contexts. Notable contributions include: – Falappa, Kern-Isberner, and Simari (2002): Proposed a non-prioritized revision operator using explanations by deduction. – Coste-Marquis and Marquis (2021): Introduced a…
Authors: Jayanta Mandi、Marco Foschini、Daniel Holler、Sylvie Thiebaux、Jorg Hoffmann、Tias Guns Paper: https://arxiv.org/abs/2408.06876 Introduction Automated planning is a critical component in many applications, such as transportation logistics, where determining the optimal route is essential. However, specifying action costs, like travel time, can be challenging due to varying factors like weather conditions. This paper explores the use of Decision-Focused Learning (DFL) to predict these action costs based on input features, aiming to optimize the overall planning solution rather than just the prediction accuracy. Predict-then-Optimize Problem Formulation In traditional approaches, prediction and planning are treated as separate tasks. However, DFL integrates these tasks, training the…
Authors: Shuang Luo、Yinchuan Li、Shunyu Liu、Xu Zhang、Yunfeng Shao、Chao Wu Paper: https://arxiv.org/abs/2408.06920 Introduction Generative Flow Networks (GFlowNets) have emerged as a promising alternative to traditional reinforcement learning (RL) for exploratory control tasks. Unlike RL, which focuses on maximizing cumulative rewards for a single optimal sequence, GFlowNets generate diverse trajectories with probabilities proportional to their rewards. However, the individual-flow matching constraint in GFlowNets limits their application in multi-agent systems, particularly in continuous joint-control problems. This paper introduces a novel method called Multi-Agent generative Continuous Flow Networks (MACFN) to address this limitation. Related Work Generative Flow Networks GFlowNets aim to generate diverse candidates in…
Authors: Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei ArXiv: https://arxiv.org/abs/2204.08387 Introduction In the realm of Document AI, self-supervised pre-training techniques have significantly advanced document understanding tasks. LayoutLMv3, a novel approach, aims to unify text and image masking for pre-training multimodal Transformers. This model addresses the discrepancy in pre-training objectives between text and image modalities, facilitating better multimodal representation learning. LayoutLMv3 is designed to be a general-purpose pre-trained model for both text-centric and image-centric Document AI tasks, achieving state-of-the-art performance across various benchmarks. Model Architecture Overview LayoutLMv3 employs a unified text-image multimodal Transformer to learn cross-modal representations. The model architecture consists of…
Authors: Ruining Li, Chuanxia Zheng, Christian Rupprecht, Andrea Vedaldi ArXiv: http://arxiv.org/abs/2408.04631v1 Abstract We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics. At test time, given a single image and a sparse set of motion trajectories (i.e., drags), Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions. This is achieved by fine-tuning a large-scale pre-trained video diffusion model, for which we propose a new conditioning architecture to inject the dragging control effectively. More importantly, we introduce the all-to-first attention mechanism, a drop-in replacement for the widely adopted spatial attention modules, which…
1.On Discrete Prompt Optimization for Diffusion Models This paper introduces the first gradient-based framework for prompt optimization in text-to-image diffusion models. We formulate prompt engineering as a discrete optimization problem over the language space. Two major challenges arise in efficiently finding a solution to this problem: (1) Enormous Domain Space: Setting the domain to the entire language space poses significant difficulty to the optimization process. (2) Text Gradient: Efficiently computing the text gradient is challenging, as it requires backpropagating through the inference steps of the diffusion model and a non-differentiable embedding lookup table. Beyond the problem formulation, our main technical…
Authors: Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Yueqian Lin, Qing Yu, Go Irie, Shafiq Joty, Yixuan Li, Hai Li, Ziwei Liu, Toshihiko Yamasaki, Kiyoharu Aizawa Category: Computer Vision and Pattern Recognition, Artificial Intelligence, Machine Learning ArXiv: http://arxiv.org/abs/2407.21794v1 Abstract: Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine learning systems and has shaped the field of OOD detection. Meanwhile, several other problems are closely related to OOD detection, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD). To unify these problems, a generalized OOD detection framework was proposed, taxonomically categorizing these five problems. However, Vision Language Models (VLMs) such as CLIP have significantly changed the paradigm…
1. Abstract This paper presents TRAIT, a task-oriented in-domain data augmentation framework for continual pre-training of large language models (LLMs). TRAIT addresses the challenges of data scarcity and lack of task awareness in domain-specific LLM adaptation. The framework consists of two main components: in-domain data selection and task-oriented synthetic passage generation. The in-domain data selection strategy identifies and selects relevant data from general corpora, significantly expanding the training dataset and enriching domain knowledge. The task-oriented synthetic passage generation strategy generates passages containing problem-specific and enlightenment paragraphs that guide the model on using domain knowledge to solve downstream tasks. The proposed…
1. Abstract This paper investigates the effectiveness of ChatGPT in machine translation tasks by exploring various prompting strategies. The authors propose several translation prompts that include information about the translation task, context domain, and Part-of-Speech (POS) tags. Experiments demonstrate that these prompts significantly enhance ChatGPT’s translation performance, often surpassing commercial translation systems. The study also explores few-shot learning approaches and evaluates ChatGPT’s ability to handle multi-domain translation tasks. 2. Quick Read a. Research Methodology and InnovationThe paper employs a black-box approach, treating ChatGPT as an unmodified system. The authors focus on designing and evaluating different translation prompts to improve ChatGPT’s…
1. Full SummaryThis paper introduces KOSMOS-2.5, a multimodal literate model designed for machine reading of text-intensive images. It addresses the limitations of existing models by incorporating two transcription tasks into a unified architecture: generating spatially-aware text blocks and producing structured markdown-formatted text. KOSMOS-2.5 leverages a shared Transformer architecture, task-specific prompts, and flexible text representations to achieve this.The model is pre-trained on a diverse corpus of text-intensive images, including scanned documents, academic papers, PowerPoint slides, and web screenshots. This data is processed and filtered using various techniques to ensure quality and diversity. KOSMOS-2.5 demonstrates strong performance on text recognition and image-to-markdown…