MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis

Authors:

Ruihui Hou、Shencheng Chen、Yongqi Fan、Lifeng Zhu、Jing Sun、Jingping Liu、Tong Ruan

Paper:

Introduction

Clinical diagnosis is a critical aspect of medical practice, involving a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which do not align with the complex multi-step diagnostic procedures found in real-world clinical settings. This paper introduces a multi-step diagnostic task and annotates a clinical diagnostic dataset called MSDiagnosis. The dataset includes primary diagnosis, differential diagnosis, and final diagnosis questions. Additionally, a novel framework combining forward inference, backward inference, reflection, and refinement is proposed to enable large language models (LLMs) to self-evaluate and adjust their diagnostic results.

Related Work

Existing Diagnostic Datasets

Most existing diagnostic datasets focus on single-step processes, where a diagnosis is made directly based on the patient’s medical history, chief complaint, and examination results. Examples include:

DDx-basic and DDx-advanced: Examination papers focusing on multiple-choice questions (MCQA).
CMExam: Medical examination dataset with multiple tasks, including diagnosis.
AgentClinic-MedQA: Combines GPT-4 with MedQA for open-ended questions.
medikal: An open QA dataset sourced from medical websites.
CMB-Clin: Based on medical textbooks.
RJUA-QA: Synthetic dataset for open QA.

These datasets do not capture the multi-step diagnostic process typically used in clinical practice.

Multi-Step Diagnostic Process

The multi-step diagnostic process involves generating a primary diagnosis based on the patient’s medical history, chief complaint, and other relevant information. A differential diagnosis is then made to narrow down the possible diseases. Finally, the final diagnosis is determined by combining the hospital course, primary diagnosis, and differential diagnosis.

Research Methodology

Proposed Framework

The proposed framework consists of two main stages:

Forward Inference: This stage involves diagnosing the patient using LLMs and similar in-context learning (ICL) examples. The LLM is guided by similar EMRs to make accurate diagnoses.
Backward Inference and Reflection: This stage involves validating the diagnostic criteria against the facts derived from the diagnostic results. The LLM uses designed reflection rules to refine the diagnostic outcomes.

Data Construction

Data Collection and Pre-processing

The dataset is sourced from a Chinese medical website, resulting in 11,900 EMRs. After de-identifying and pre-processing the data, 3,501 high-quality EMRs are obtained.

Data Annotation

Five diagnostic questions are manually constructed and expanded using GPT-4. The answers are annotated by a professional team, ensuring high-quality data. The dataset includes 2,225 medical records covering 12 departments.

Experimental Design

Experimental Setup

The experiments involve several baseline methods, including open-source medical and general LLMs, closed-source LLMs, and other methods. Various evaluation metrics are used, such as entity F1, Rouge-L, BLEU-1, and Macro-Recall.

Implementation Details

Default hyperparameters are used for all open-source models. Detailed implementation settings are provided in the appendix.

Results and Analysis

Main Results

The proposed method outperforms all baselines in Macro-Recall, Rouge-L, and Blue-1 metrics, demonstrating its effectiveness. The best-performing model achieves a primary F1 score of 38.78% and a final F1 score of 35.00%.

Detailed Analysis

Ablation Study

Two ablation experiments are conducted to analyze the multi-stage diagnostic process and the framework. The results show that the primary diagnosis significantly influences the final diagnosis’s performance, and the multi-step diagnostic process improves interpretability.

Case Study

An analysis of 100 error samples reveals four main types of errors: lack of domain knowledge, confusion between symptoms and diseases, diagnostic basis inconsistent with facts, and other errors.

Overall Conclusion

The MSDiagnosis dataset addresses the limitations of existing single-step diagnostic datasets by incorporating a multi-step diagnostic process. The proposed framework effectively combines forward and backward inference, reflection, and refinement to improve diagnostic accuracy and interpretability. Extensive experiments demonstrate the framework’s effectiveness, although the dataset’s uneven distribution across departments remains a limitation. Future work can address this issue through machine learning methods and data sampling strategies.

This blog post provides a comprehensive overview of the MSDiagnosis dataset and the proposed framework for multi-step clinical diagnosis. The detailed analysis and experimental results highlight the framework’s effectiveness and the dataset’s potential for further research in clinical diagnostic tasks.

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

NeCo: Improving DINOv2’s spatial representations in 19 GPU hours with Patch Neighbor Consistency

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

MSDiagnosis: An EMR-based Dataset for Clinical Multi-Step Diagnosis

Authors:

Paper:

Introduction

Related Work

Existing Diagnostic Datasets

Multi-Step Diagnostic Process

Research Methodology

Proposed Framework

Data Construction

Data Collection and Pre-processing

Data Annotation

Experimental Design

Experimental Setup

Implementation Details

Results and Analysis

Main Results

Detailed Analysis

Ablation Study

Case Study

Overall Conclusion

Related Posts