NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: EJ1486230
Record Type: Journal
Publication Date: 2025-Oct
Pages: 12
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1935-9772
EISSN: EISSN-1935-9780
Available Date: 2025-07-11
AI's Ability to Interpret Unlabeled Anatomy Images and Supplement Educational Research as an AI Rater
Lord J. Hyeamang1; Tejas C. Sekhar1; Emily Rush2; Amy C. Beresheim3; Colleen M. Cheverko3; William S. Brooks4; Abbey C. M. Breckling5; M. Nazmul Karim6; Christopher Ferrigno3; Adam B. Wilson3
Anatomical Sciences Education, v18 n10 p1102-1113 2025
Evidence suggests custom chatbots are superior to commercial generative artificial intelligence (GenAI) systems for text-based anatomy content inquiries. This study evaluates ChatGPT-4o's and Claude 3.5 Sonnet's capabilities to interpret unlabeled anatomical images. Secondarily, ChatGPT o1-preview was evaluated as an AI rater to grade AI-generated outputs using a rubric and was compared against human raters. Anatomical images (five musculoskeletal, five thoracic) representing diverse image-based media (e.g., illustrations, photographs, MRI) were annotated with identification markers (e.g., arrows, circles) and uploaded to each GenAI system for interpretation. Forty-five prompts (i.e., 15 first-order, 15 second-order, and 15 third-order questions) with associated images were submitted to both GenAI systems across two timepoints. Responses were graded by anatomy experts for factual accuracy and superfluity (the presence of excessive wording) on a three-point Likert scale. ChatGPT o1-preview was tested for agreement against human anatomy experts to determine its usefulness as an AI rater. Statistical analyses included inter-rater agreement, hierarchical linear modeling, and test-retest reliability. ChatGPT-4o's factual accuracy score across 45 outputs was 68.0% compared to Claude 3.5 Sonnet's score of 61.5% (p = 0.319). As an AI rater, ChatGPT o1-preview showed moderate to substantial agreement with human raters (Cohen's kappa = 0.545-0.755) for evaluating factual accuracy according to a rubric of textbook answers. Further improvements and evaluations are needed before commercial GenAI systems can be used as credible student resources in anatomy education. Similarly, ChatGPT o1-preview demonstrates promise as an AI assistant for educational research, though further investigation is warranted.
Wiley. Available from: John Wiley & Sons, Inc. 111 River Street, Hoboken, NJ 07030. Tel: 800-835-6770; e-mail: cs-journals@wiley.com; Web site: https://www-wiley-com.bibliotheek.ehb.be/en-us
Publication Type: Journal Articles; Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: 1Rush Medical College, Rush University, Chicago, Illinois, USA; 2Academic Affairs, Rush University, Chicago, Illinois, USA; 3Department of Anatomy and Cell Biology, Rush Medical College, Rush University, Chicago, Illinois, USA; 4Department of Medical Education, Marnix E. Heersink School of Medicine, University of Alabama at Birmingham, Birmingham, Alabama, USA; 5Department of Anatomy and Cell Biology, College of Medicine, University of Illinois at Chicago, Chicago, Illinois, USA; 6School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria, Australia