ERIC Number: EJ1464000
Record Type: Journal
Publication Date: 2025
Pages: 21
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-1648-3898
EISSN: EISSN-2538-7138
Available Date: 0000-00-00
Graders of the Future: Comparing the Consistency and Accuracy of GPT4 and Pre-Service Teachers in Physics Essay Question Assessments
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu
Journal of Baltic Science Education, v24 n1 p187-207 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative performance of ChatGPT4 and human graders in scoring upper-secondary physics essay questions. Eighty upper-secondary students' responses to two essay questions were evaluated by 30 pre-service teachers and ChatGPT4. The analysis highlighted their scoring consistency and accuracy, including intra-human comparisons, GPT grading at different times, human-GPT comparisons, and grading variations across cognitive categories. The intraclass correlation coefficient (ICC) was used to assess consistency, while accuracy was illustrated through Pearson correlation coefficient analysis with expert scores. The findings reveal that while ChatGPT4 demonstrated higher consistency in scoring, human scorers showed superior accuracy in most instances. These results underscore the strengths and limitations of using LLMs in educational assessments. The high consistency of LLMs can be valuable in standardizing assessments across diverse educational contexts, while the nuanced understanding and flexibility of human graders are irreplaceable in handling complex subjective evaluations.
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy, Evaluators, Computational Linguistics, Science Education, Educational Assessment, Secondary School Students, Essays, Writing Evaluation, Interrater Reliability, Correlation, Scores, Preservice Teachers, Comparative Analysis, Foreign Countries
Scientia Socialis Ltd. 29 K. Donelaicio Street, LT-78115 Siauliai, Republic of Lithuania. e-mail: scientia@scientiasocialis.lt; e-mail: mail.jbse@gmail.com; Web site: http://www.scientiasocialis.lt/jbse/
Publication Type: Journal Articles; Reports - Research
Education Level: Secondary Education; Higher Education; Postsecondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: China
Grant or Contract Numbers: N/A
Author Affiliations: N/A