2025

RAR^2: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval
RAR^2: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval

Kaishuai Xu, Wenjun Hou*, Yi Cheng*, Wenjie Li (* equal contribution)

EMNLP 2025 Findings Findings

In this work, we propose RAR^2, a joint learning framework that improves both Reasoning-Augmented Retrieval and Retrieval-Augmented Reasoning. Moreover, we design two test-time scaling strategies to explore the boundaries of our framework.

RAR^2: Retrieval-Augmented Medical Reasoning via Thought-Driven Retrieval

Kaishuai Xu, Wenjun Hou*, Yi Cheng*, Wenjie Li (* equal contribution)

EMNLP 2025 Findings Findings

In this work, we propose RAR^2, a joint learning framework that improves both Reasoning-Augmented Retrieval and Retrieval-Augmented Reasoning. Moreover, we design two test-time scaling strategies to explore the boundaries of our framework.

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu

NeurIPS 2025 Poster

LIMOPro introduces PIR (Perplexity-based Importance Refinement), a novel framework that systematically refines reasoning chains to optimize the balance between efficiency and effectiveness.

LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling

Yang Xiao, Jiashuo Wang, Ruifeng Yuan, Chunpu Xu, Kaishuai Xu, Wenjie Li, Pengfei Liu

NeurIPS 2025 Poster

LIMOPro introduces PIR (Perplexity-based Importance Refinement), a novel framework that systematically refines reasoning chains to optimize the balance between efficiency and effectiveness.

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing
Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

ACL 2025 Main Conference

In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into pivotal tokens in reasoning or computation steps to construct hard pairs for error mitigation. Compared with other preference learning methods, RISE further refines the training objective without requiring fine-grained sampling or preference annotation.

Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing

Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

ACL 2025 Main Conference

In this work, we propose a novel preference learning framework called eRror-Injected Self-Editing (RISE), which injects predefined subtle errors into pivotal tokens in reasoning or computation steps to construct hard pairs for error mitigation. Compared with other preference learning methods, RISE further refines the training objective without requiring fine-grained sampling or preference annotation.

RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection
RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Heng Li, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

ACL 2025 Main Conference

We propose Radar, a framework for enhancing radiology report generation with supplementary knowledge injection. Radar improves report generation by systematically leveraging both the internal knowledge of an LLM and externally retrieved information.

RADAR: Enhancing Radiology Report Generation with Supplementary Knowledge Injection

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Heng Li, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

ACL 2025 Main Conference

We propose Radar, a framework for enhancing radiology report generation with supplementary knowledge injection. Radar improves report generation by systematically leveraging both the internal knowledge of an LLM and externally retrieved information.

Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework

Kaishuai Xu, Tiezheng Yu, Yi Cheng, Wenjun Hou, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

ACL 2025 Findings Findings

We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses to evaluate LLM responses.

Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework

Kaishuai Xu, Tiezheng Yu, Yi Cheng, Wenjun Hou, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li

ACL 2025 Findings Findings

We propose a novel evaluation framework, ARJudge, that adaptively formulates evaluation criteria and synthesizes both text-based and code-driven analyses to evaluate LLM responses.

Large language models for disease diagnosis: a scoping review
Large language models for disease diagnosis: a scoping review

Shuang Zhou#, Zidu Xu, Mian Zhang, Chunpu Xu, Yawen Guo, Zaifu Zhan, Yi Fang, Sirui Ding, Jiashuo Wang, Kaishuai Xu, Liqiao Xia, Jeremy Yeung, Daochen Zha, Dongming Cai, Genevieve B. Melton, Mingquan Lin, Rui Zhang (# corresponding author)

npj Artificial Intelligence 1, Article number: 9 (2025)

In this article, we perform a comprehensive review of LLM-based methods for disease diagnosis. Our review examines the existing literature across various dimensions, including disease types and associated clinical specialties, clinical data, LLM techniques, and evaluation methods.

Large language models for disease diagnosis: a scoping review

Shuang Zhou#, Zidu Xu, Mian Zhang, Chunpu Xu, Yawen Guo, Zaifu Zhan, Yi Fang, Sirui Ding, Jiashuo Wang, Kaishuai Xu, Liqiao Xia, Jeremy Yeung, Daochen Zha, Dongming Cai, Genevieve B. Melton, Mingquan Lin, Rui Zhang (# corresponding author)

npj Artificial Intelligence 1, Article number: 9 (2025)

In this article, we perform a comprehensive review of LLM-based methods for disease diagnosis. Our review examines the existing literature across various dimensions, including disease types and associated clinical specialties, clinical data, LLM techniques, and evaluation methods.

Integrative Decoding: Improving Factuality via Implicit Self-consistency
Integrative Decoding: Improving Factuality via Implicit Self-consistency

Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

ICLR 2025 Poster

In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective.

Integrative Decoding: Improving Factuality via Implicit Self-consistency

Yi Cheng, Xiao Liang, Yeyun Gong, Wen Xiao, Song Wang, Yuji Zhang, Wenjun Hou, Kaishuai Xu, Wenge Liu, Wenjie Li, Jian Jiao, Qi Chen, Peng Cheng, Wayne Xiong

ICLR 2025 Poster

In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective.

2024

Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry
Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

arXiv preprint 2024 Preprint

We propose SCAN, a simple yet effective memory-augmented framework that leverages Multimodal LLMs to improve surgical context comprehension via Self-Contained Inquiry.

Memory-Augmented Multimodal LLMs for Surgical VQA via Self-Contained Inquiry

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

arXiv preprint 2024 Preprint

We propose SCAN, a simple yet effective memory-augmented framework that leverages Multimodal LLMs to improve surgical context comprehension via Self-Contained Inquiry.

ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation
ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

EMNLP 2024 Findings Findings

We propose ICON, which improves the inter-report consistency of radiology report generation. Aiming to enhance the system’s ability to capture similarities in semantically equivalent lesions, our approach first involves extracting lesions from input images and examining their characteristics. Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes, achieved through a linear combination during the training phase. Extensive experiments on three publicly available chest X-ray datasets verify the effectiveness of our approach, both in terms of improving the consistency and accuracy of the generated reports.

ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Yan Hu, Wenjie Li, Jiang Liu (* equal contribution)

EMNLP 2024 Findings Findings

We propose ICON, which improves the inter-report consistency of radiology report generation. Aiming to enhance the system’s ability to capture similarities in semantically equivalent lesions, our approach first involves extracting lesions from input images and examining their characteristics. Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes, achieved through a linear combination during the training phase. Extensive experiments on three publicly available chest X-ray datasets verify the effectiveness of our approach, both in terms of improving the consistency and accuracy of the generated reports.

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

EMNLP 2024 Main Conference

We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals.

When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection

Xiangyu Zhang, Hexin Liu, Kaishuai Xu, Qiquan Zhang, Daijiao Liu, Beena Ahmed, Julien Epps

EMNLP 2024 Main Conference

We investigate an efficient method for depression detection by integrating speech signals into LLMs utilizing Acoustic Landmarks. By incorporating acoustic landmarks, which are specific to the pronunciation of spoken words, our method adds critical dimensions to text transcripts. This integration also provides insights into the unique speech patterns of individuals, revealing the potential mental states of individuals.

Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment
Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

Kaishuai Xu, Yi Cheng*, Wenjun Hou*, Qiaoyu Tan, Wenjie Li (* equal contribution)

ACL 2024 Findings Findings

We propose a novel framework, Emulation, designed to generate an appropriate response that relies on abductive and deductive diagnostic reasoning analyses and aligns with clinician preferences through thought process modeling. Experimental results on two datasets confirm the efficacy of Emulation. Crucially, our framework furnishes clear explanations for the generated responses, enhancing its transparency in medical consultations.

Reasoning Like a Doctor: Improving Medical Dialogue Systems via Diagnostic Reasoning Process Alignment

Kaishuai Xu, Yi Cheng*, Wenjun Hou*, Qiaoyu Tan, Wenjie Li (* equal contribution)

ACL 2024 Findings Findings

We propose a novel framework, Emulation, designed to generate an appropriate response that relies on abductive and deductive diagnostic reasoning analyses and aligns with clinician preferences through thought process modeling. Experimental results on two datasets confirm the efficacy of Emulation. Crucially, our framework furnishes clear explanations for the generated responses, enhancing its transparency in medical consultations.

AutoPal: Autonomous Adaptation to Users for Personal AI Companionship
AutoPal: Autonomous Adaptation to Users for Personal AI Companionship

Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Wenjie Li, Xian Wu, Yefeng Zheng

arXiv preprint 2024 Preprint

We devise a hierarchical framework, AutoPal, that enables controllable and authentic adjustments to the agent's persona based on user interactions. A persona-matching dataset is constructed to facilitate the learning of optimal persona adaptations.

AutoPal: Autonomous Adaptation to Users for Personal AI Companionship

Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Wenjie Li, Xian Wu, Yefeng Zheng

arXiv preprint 2024 Preprint

We devise a hierarchical framework, AutoPal, that enables controllable and authentic adjustments to the agent's persona based on user interactions. A persona-matching dataset is constructed to facilitate the learning of optimal persona adaptations.

No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks

Chak Tou Leong, Yi Cheng, Kaishuai Xu, Jian Wang, Hanlin Wang, Wenjie Li

arXiv preprint 2024 Preprint

We break down the safeguarding process of an LLM when encountered with harmful instructions into three stages: (1) recognizing harmful instructions, (2) generating an initial refusing tone, and (3) completing the refusal response. Accordingly, we investigate whether and how different attack strategies could influence each stage of this safeguarding process.

No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks

Chak Tou Leong, Yi Cheng, Kaishuai Xu, Jian Wang, Hanlin Wang, Wenjie Li

arXiv preprint 2024 Preprint

We break down the safeguarding process of an LLM when encountered with harmful instructions into three stages: (1) recognizing harmful instructions, (2) generating an initial refusing tone, and (3) completing the refusal response. Accordingly, we investigate whether and how different attack strategies could influence each stage of this safeguarding process.

Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis
Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

arXiv preprint 2024 Preprint

We propose a medical dialogue generation framework with the Intuitive-then-Analytic Differential Diagnosis (IADDx). Our method starts with a differential diagnosis via retrieval-based intuitive association and subsequently refines it through a graph-enhanced analytic procedure. The resulting differential diagnosis is then used to retrieve medical knowledge and guide response generation.

Medical Dialogue Generation via Intuitive-then-Analytical Differential Diagnosis

Kaishuai Xu, Wenjun Hou, Yi Cheng, Jian Wang, Wenjie Li

arXiv preprint 2024 Preprint

We propose a medical dialogue generation framework with the Intuitive-then-Analytic Differential Diagnosis (IADDx). Our method starts with a differential diagnosis via retrieval-based intuitive association and subsequently refines it through a graph-enhanced analytic procedure. The resulting differential diagnosis is then used to retrieve medical knowledge and guide response generation.

2023

RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning
RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Wenjie Li, Jiang Liu (* equal contribution)

EMNLP 2023 Findings Findings

We propose RECAP, which generates precise and accurate radiology reports via dynamic disease progression reasoning. Specifically, RECAP first predicts the observations and progressions (i.e., spatiotemporal information) given two consecutive radiographs. It then combines the historical records, spatiotemporal information, and radiographs for report generation, where a disease progression graph and dynamic progression reasoning mechanism are devised to accurately select the attributes of each observation and progression. Extensive experiments on two publicly available datasets demonstrate the effectiveness of our model.

RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning

Wenjun Hou, Yi Cheng*, Kaishuai Xu*, Wenjie Li, Jiang Liu (* equal contribution)

EMNLP 2023 Findings Findings

We propose RECAP, which generates precise and accurate radiology reports via dynamic disease progression reasoning. Specifically, RECAP first predicts the observations and progressions (i.e., spatiotemporal information) given two consecutive radiographs. It then combines the historical records, spatiotemporal information, and radiographs for report generation, where a disease progression graph and dynamic progression reasoning mechanism are devised to accurately select the attributes of each observation and progression. Extensive experiments on two publicly available datasets demonstrate the effectiveness of our model.

ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning
ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning

Wenjun Hou, Kaishuai Xu*, Yi Cheng*, Wenjie Li, Jiang Liu (* equal contribution)

ACL 2023 Main Proceedings

In this paper, we propose an Observation-guided radiology Report Generation framework (ORGan). It first produces an observation plan and then feeds both the plan and radiographs for report generation, where an observation graph and a tree reasoning mechanism are adopted to precisely enrich the plan information by capturing the multi-formats of each observation. Experimental results demonstrate that our framework outperforms previous state-of-the-art methods regarding text quality and clinical efficacy.

ORGAN: Observation-Guided Radiology Report Generation via Tree Reasoning

Wenjun Hou, Kaishuai Xu*, Yi Cheng*, Wenjie Li, Jiang Liu (* equal contribution)

ACL 2023 Main Proceedings

In this paper, we propose an Observation-guided radiology Report Generation framework (ORGan). It first produces an observation plan and then feeds both the plan and radiographs for report generation, where an observation graph and a tree reasoning mechanism are adopted to precisely enrich the plan information by capturing the multi-formats of each observation. Experimental results demonstrate that our framework outperforms previous state-of-the-art methods regarding text quality and clinical efficacy.

Medical Dialogue Generation via Dual Flow Modeling
Medical Dialogue Generation via Dual Flow Modeling

Kaishuai Xu, Wenjun Hou*, Yi Cheng*, Jian Wang, Wenjie Li (* equal contribution)

ACL 2023 Findings Findings

We propose a Dual Flow enhanced Medical (DFMed) dialogue generation framework. It extracts the medical entities and dialogue acts used in the dialogue history and models their transitions with an entity-centric graph flow and a sequential act flow, respectively. We employ two sequential models to encode them and devise an interweaving component to enhance their interactions. Experiments on two datasets demonstrate that our method exceeds baselines in both automatic and manual evaluations.

Medical Dialogue Generation via Dual Flow Modeling

Kaishuai Xu, Wenjun Hou*, Yi Cheng*, Jian Wang, Wenjie Li (* equal contribution)

ACL 2023 Findings Findings

We propose a Dual Flow enhanced Medical (DFMed) dialogue generation framework. It extracts the medical entities and dialogue acts used in the dialogue history and models their transitions with an entity-centric graph flow and a sequential act flow, respectively. We employ two sequential models to encode them and devise an interweaving component to enhance their interactions. Experiments on two datasets demonstrate that our method exceeds baselines in both automatic and manual evaluations.