Preprint
Concept Paper

This version is not peer-reviewed.

Traditional Chinese Medicinal Materials Knowledge System: A Study on Fine-Tuning and Validation of Language Models for Traditional Chinese Medicinal Materials Knowledge

Submitted:

24 April 2026

Posted:

27 April 2026

You are already at the latest version

Abstract
This study aims to construct a knowledge system on traditional Chinese medicinal materials centered on large language models and to evaluate the practical feasibility of controlled fine tuning (SFT) techniques within professional Traditional Chinese Medicine (TCM). The study adopts Biancang-Qwen2.5-7B-Instruct as the base model and employs a knowledge internalization strategy to reduce the reliance on external retrieval mechanisms, with supervised fine-tuning conducted using LoRA. The evaluation dataset consists of 1,751 text-only multiple-choice questions related to Chinese medicinal materials from national TCM practitioner examinations conducted between 2005 and 2025. The accuracy of multiple-choice question (MCQ) was used as the primary evaluation metric, and Lingdan-13B-Base was included as a baseline model for comparison. The experimental pipeline covered the pre-processing of the data set and the comparative analysis of the inference results across multiple variants of the model, with the McNemar test applied to examine the statistical significance of the performance differences before and after fine-tuning. The results indicate that, without introducing any external retrieval mechanisms, the accuracy of the model increased from 58.08% to 72.76%, demonstrating substantial improvements in both knowledge comprehension and answer stability regarding Chinese medicinal materials. These findings confirm the effectiveness of the proposed internalized knowledge fine-tuning strategy for the national TCM examination task. Finally, this study delivers an integrated TCM medicinal materials knowledge system that incorporates the fine-tuned model, provides functionalities for national examination practice and TCM knowledge querying, and validates its feasibility and stability in real-world application scenarios.
Keywords: 
;  ;  ;  ;  
Copyright: This open access article is published under a Creative Commons CC BY 4.0 license, which permit the free download, distribution, and reuse, provided that the author and preprint are cited in any reuse.
Prerpints.org logo

Preprints.org is a free preprint server supported by MDPI in Basel, Switzerland.

Subscribe

Disclaimer

Terms of Use

Privacy Policy

Privacy Settings

© 2026 MDPI (Basel, Switzerland) unless otherwise stated