Authors:
Ruihui Hou、Shencheng Chen、Yongqi Fan、Lifeng Zhu、Jing Sun、Jingping Liu、Tong Ruan
Paper:
https://arxiv.org/abs/2408.10039
Introduction
Clinical diagnosis is a critical aspect of medical practice, involving a continuous and evolving process that includes primary diagnosis, differential diagnosis, and final diagnosis. However, most existing clinical diagnostic tasks are single-step processes, which do not align with the complex multi-step diagnostic procedures found in real-world clinical settings. This paper introduces a multi-step diagnostic task and annotates a clinical diagnostic dataset called MSDiagnosis. The dataset includes primary diagnosis, differential diagnosis, and final diagnosis questions. Additionally, a novel framework combining forward inference, backward inference, reflection, and refinement is proposed to enable large language models (LLMs) to self-evaluate and adjust their diagnostic results.
Related Work
Existing Diagnostic Datasets
Most existing diagnostic datasets focus on single-step processes, where a diagnosis is made directly based on the patient’s medical history, chief complaint, and examination results. Examples include:
- DDx-basic and DDx-advanced: Examination papers focusing on multiple-choice questions (MCQA).
- CMExam: Medical examination dataset with multiple tasks, including diagnosis.
- AgentClinic-MedQA: Combines GPT-4 with MedQA for open-ended questions.
- medikal: An open QA dataset sourced from medical websites.
- CMB-Clin: Based on medical textbooks.
- RJUA-QA: Synthetic dataset for open QA.
These datasets do not capture the multi-step diagnostic process typically used in clinical practice.
Multi-Step Diagnostic Process
The multi-step diagnostic process involves generating a primary diagnosis based on the patient’s medical history, chief complaint, and other relevant information. A differential diagnosis is then made to narrow down the possible diseases. Finally, the final diagnosis is determined by combining the hospital course, primary diagnosis, and differential diagnosis.
Research Methodology
Proposed Framework
The proposed framework consists of two main stages:
- Forward Inference: This stage involves diagnosing the patient using LLMs and similar in-context learning (ICL) examples. The LLM is guided by similar EMRs to make accurate diagnoses.
- Backward Inference and Reflection: This stage involves validating the diagnostic criteria against the facts derived from the diagnostic results. The LLM uses designed reflection rules to refine the diagnostic outcomes.
Data Construction
Data Collection and Pre-processing
The dataset is sourced from a Chinese medical website, resulting in 11,900 EMRs. After de-identifying and pre-processing the data, 3,501 high-quality EMRs are obtained.
Data Annotation
Five diagnostic questions are manually constructed and expanded using GPT-4. The answers are annotated by a professional team, ensuring high-quality data. The dataset includes 2,225 medical records covering 12 departments.
Experimental Design
Experimental Setup
The experiments involve several baseline methods, including open-source medical and general LLMs, closed-source LLMs, and other methods. Various evaluation metrics are used, such as entity F1, Rouge-L, BLEU-1, and Macro-Recall.
Implementation Details
Default hyperparameters are used for all open-source models. Detailed implementation settings are provided in the appendix.
Results and Analysis
Main Results
The proposed method outperforms all baselines in Macro-Recall, Rouge-L, and Blue-1 metrics, demonstrating its effectiveness. The best-performing model achieves a primary F1 score of 38.78% and a final F1 score of 35.00%.
Detailed Analysis
Ablation Study
Two ablation experiments are conducted to analyze the multi-stage diagnostic process and the framework. The results show that the primary diagnosis significantly influences the final diagnosis’s performance, and the multi-step diagnostic process improves interpretability.
Case Study
An analysis of 100 error samples reveals four main types of errors: lack of domain knowledge, confusion between symptoms and diseases, diagnostic basis inconsistent with facts, and other errors.
Overall Conclusion
The MSDiagnosis dataset addresses the limitations of existing single-step diagnostic datasets by incorporating a multi-step diagnostic process. The proposed framework effectively combines forward and backward inference, reflection, and refinement to improve diagnostic accuracy and interpretability. Extensive experiments demonstrate the framework’s effectiveness, although the dataset’s uneven distribution across departments remains a limitation. Future work can address this issue through machine learning methods and data sampling strategies.
This blog post provides a comprehensive overview of the MSDiagnosis dataset and the proposed framework for multi-step clinical diagnosis. The detailed analysis and experimental results highlight the framework’s effectiveness and the dataset’s potential for further research in clinical diagnostic tasks.