Author: James Smith

scholar

Better Alignment with Instruction Back-and-Forth Translation

By James SmithAugust 9, 20240

Authors: Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li ArXiv: http://arxiv.org/abs/2408.04614v1 Abstract: We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also…

scholar

A Survey on Data Quality Dimensions and Toolsfor Machine Learning

By James SmithJuly 22, 20240

1. Abstract This paper conducts a comprehensive survey on data quality (DQ) evaluation and improvement tools for machine learning (ML). It emphasizes the critical role of high-quality data in ML model performance, fairness, robustness, safety, and scalability. The paper introduces four DQ dimensions (intrinsic, contextual, representational, and accessibility) and twelve metrics specific to ML, providing definitions and examples.The survey reviews seventeen open-source DQ tools released in the past five years, analyzing their strengths and limitations based on the DQ dimensions and metrics. It proposes a roadmap for developing new DQ tools, highlighting the importance of integrating automation, monitoring, and AI…

scholar

Unifying Large Language Models and Knowledge Graphs: A Roadmap

By James SmithJuly 22, 20240

Abstract This paper evaluates the effectiveness of large language models (LLMs), specifically ChatGPT, in performing machine translation (MT) tasks across a diverse range of languages. Using the FLORES-200 benchmark, the authors compare ChatGPT’s performance with traditional MT models like NLLB, as well as commercial systems like Google Translate and the more advanced GPT-4. The results reveal that while ChatGPT demonstrates competitive performance for high-resource languages (HRLs), it consistently falls short for low-resource languages (LRLs), underperforming traditional MT models in 84.1% of the languages evaluated. The study also highlights the limited effectiveness of few-shot prompts for improving LLM MT performance and…

scholar

Seven Failure Points When Engineering a Retrieval Augmented Generation System

By James SmithJuly 17, 20240

1. Abstract The paper explores the complexities and challenges of engineering Retrieval Augmented Generation (RAG) systems, which leverage large language models (LLMs) to generate answers by retrieving relevant information from a data store. By presenting a mixed-methods approach that combines empirical experiments and case studies, the authors provide valuable insights into the limitations and failure points of RAG systems. They identify seven key failure points, including missing content, missed top-ranked documents, context limitations, extraction failures, format errors, incorrect specificity, and incomplete answers. The paper emphasizes the importance of continuous calibration, configuration, and testing for RAG systems, highlighting their evolving nature.…

Health

Acoustic Analysis and Prediction of Type 2 vDiabetes Mellitus Using Smartphone-Recorded Voice Segments

By James SmithJuly 17, 20240

1. Abstract Objective: This study investigates the potential of voice analysis as a tool for prescreening or monitoring Type 2 Diabetes Mellitus (T2DM) by examining differences in voice recordings between non-diabetic and T2DM individuals.Methods: A total of 267 participants (79 women and 113 men non-diabetic, 18 women and 57 men T2DM) were recruited in India. Using a smartphone application, participants recorded a fixed phrase up to 6 times daily for 2 weeks, resulting in 18,465 recordings. Fourteen acoustic features were extracted from each recording to analyze differences between groups and create prediction models for T2DM status.Results: Significant differences were found…

scholar

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Author: James Smith

Better Alignment with Instruction Back-and-Forth Translation

A Survey on Data Quality Dimensions and Toolsfor Machine Learning

Unifying Large Language Models and Knowledge Graphs: A Roadmap

Seven Failure Points When Engineering a Retrieval Augmented Generation System

Acoustic Analysis and Prediction of Type 2 vDiabetes Mellitus Using Smartphone-Recorded Voice Segments

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

Author: James Smith