ERIC Number: EJ1462357
Record Type: Journal
Publication Date: 2025-Dec
Pages: 10
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: EISSN-2731-5525
Available Date: 2025-03-13
Assessment of Large Language Models' Performances and Hallucinations for Chinese Postgraduate Medical Entrance Examination
Hongfei Ye1,2; Jian Xu3; Danqing Huang4; Meng Xie1; Jinming Guo5; Junrui Yang6; Haiwei Bao7; Mingzhi Zhang5; Ce Zheng1,2
Discover Education, v4 Article 59 2025
This study evaluates Large language models (LLMs)' performance on Chinese Postgraduate Medical Entrance Examination (CPGMEE) as well as the hallucinations produced by LLMs and investigate their implications for medical education. We curated 10 trials of mock CPGMEE to evaluate the performances of 4 LLMs (GPT-4.0, ChatGPT, QWen 2.1 and Ernie 4.0). Each question was inputted into the LLMs, and the responses were independently reviewed by three experienced graders to determine the accuracy using a three-tier accuracy scale (poor, borderline, good). The hallucination rates of LLMs' responses were also evaluated. We chose GPT-4.0 and Ernie 4.0 for further analysis since these two LLMs achieved better performance among the four. Ernie 4.0 outperformed GPT-4.0 in overall accuracy (76.2% vs. 69.1%, p < 0.0001), achieving higher 'good' (70.0% vs. 64.6%, p < 0.01) and lower 'poor' (25.2% vs 32.3%, p < 0.01) rating. Factuality hallucination was the most prevalent type of hallucination (9.7% and 14.7% for GPT-4 and Ernie 4 respectively). Ernie 4.0 exhibited lower rates in factual fabrication (6.0% vs 7.8%, p = 0.033), instruction inconsistency (2.3% vs 5.4%, p < 0.0001) and logical inconsistency (3.7% vs 5.7%, p = 0.005) than GPT-4.0.Our results underscore the promising potential of both GPT-4.0 and Ernie 4.0 in assisting CPGMEE preparation and enhancing postgraduate medical education programs.
Descriptors: College Entrance Examinations, Foreign Countries, Computational Linguistics, Graduate Medical Education, Artificial Intelligence, Item Analysis, Test Items, Accuracy, Computer Software, Test Preparation, Technology Uses in Education
Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link-springer-com.bibliotheek.ehb.be/
Publication Type: Journal Articles; Reports - Research
Education Level: Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: China
Grant or Contract Numbers: N/A
Author Affiliations: 1Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Department of Ophthalmology, Shanghai, China; 2Shanghai Jiao Tong University, Institute of Hospital Development Strategy, China Hospital Development Institute, Shanghai, China; 3Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Big Data Center, Shanghai, China; 4Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Discipline Inspection & Supervision Office, Shanghai, China; 5Shantou University Medical College, Joint Shantou International Eye Center of Shantou University and the Chinese University of Hong Kong, Shantou, China; 6The 74th, Army Group Hospital, Department of Ophthalmology, Guangzhou, China; 7Tongcheng Digital Technology Co., Ltd., Shanghai, China