The Intent of ChatGPT Usage and Its Robustness in Medical Proficiency Exams: A Systematic Review.

Tatiana Chaiban; Zeinab Nahle; Ghaith Assi; Michelle Cherfane

Notes FAQ Contact Us

Back to results

Peer reviewed

Direct link

ERIC Number: EJ1448574

Record Type: Journal

Publication Date: 2024

Pages: 21

Abstractor: As Provided

ISBN: N/A

ISSN: N/A

EISSN: EISSN-2731-5525

Available Date: N/A

The Intent of ChatGPT Usage and Its Robustness in Medical Proficiency Exams: A Systematic Review

Tatiana Chaiban; Zeinab Nahle; Ghaith Assi; Michelle Cherfane

Discover Education, v3 Article 232 2024

Background: Since it was first launched, ChatGPT, a Large Language Model (LLM), has been widely used across different disciplines, particularly the medical field. Objective: The main aim of this review is to thoroughly assess the performance of the distinct version of ChatGPT in subspecialty written medical proficiency exams and the factors that impact it. Methods: Distinct online databases were searched for appropriate articles that fit the intended objectives of the study: PubMed, CINAHL, and Web of Science. A group of reviewers was assembled to create an appropriate methodology framework for the articles to be included. Results: 16 articles were adopted for this review that assessed the performance of different ChatGPT versions across different subspecialty written examinations, such as surgery, neurology, orthopedics, trauma and orthopedics, core cardiology, family medicine, and dermatology. The studies reported different passing grades and rankings with distinct accuracy rates, ranging from 35.8% to 91%, across different datasets and subspecialties. Some of the factors that were highlighted as impacting its correctness were the following: (1) ChatGPT distinct versions; (2) medical subspecialties; (3) types of questions; (4) language; and (5) comparators. Conclusions: This review indicates ChatGPT's performance on the different medical specialty examinations and poses potential research to investigate whether ChatGPT can enhance the learning and support medical students taking a range of medical specialty exams. However, to avoid exploitation and any detrimental effects on the real world of medicine, it is crucial to be aware of its limitations and improve the ongoing evaluation of this AI tool.

Descriptors: Medical Education, Accuracy, Artificial Intelligence, Computer Software, Technology Uses in Education, Scores, Computational Linguistics, Databases, Surgery, Neurology, Medicine, Specialization, Test Construction, Medical Students, Test Items, Trauma, Test Use

Springer. Available from: Springer Nature. One New York Plaza, Suite 4600, New York, NY 10004. Tel: 800-777-4643; Tel: 212-460-1500; Fax: 212-460-1700; e-mail: customerservice@springernature.com; Web site: https://link-springer-com.bibliotheek.ehb.be/

Publication Type: Journal Articles; Information Analyses

Education Level: Higher Education; Postsecondary Education

Audience: N/A

Language: English

Sponsor: N/A

Authoring Institution: N/A

Grant or Contract Numbers: N/A

Author Affiliations: N/A