ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	4

Descriptor

Computer Assisted Testing	4
Interrater Reliability	4
Reliability	4
Evaluators	3
Grading	3
Scoring	3
Accuracy	2
Artificial Intelligence	2
Comparative Analysis	2
Correlation	2
Essay Tests	2
College Faculty	1
Comparative Testing	1
Computational Linguistics	1
Computer Software	1
Educational Technology	1
English Literature	1
Error of Measurement	1
Essays	1
Ethics	1
Evaluation Methods	1
Evaluation Research	1
Examiners	1
Foreign Countries	1
Item Response Theory	1
More ▼

Source

Advances in Physiology…	1
British Educational Research…	1
British Journal of…	1
International Journal of…	1

Author

Amanda Huee-Ping Wong	1
Bell, John F.	1
Engelhard, George, Jr.	1
Foltz, Peter	1
Ivan Cherh Chiet Low	1
Johnson, Martin	1
Jonas Flodén	1
Nadas, Rita	1
Nathasha Vihangi Luke	1
Rosenstein, Mark	1
Swapna Haresh Teckwani	1
Wind, Stefanie A.	1
Wolfe, Edward W.	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	4

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Singapore

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Accuracy and Reliability of Large Language Models in Assessing Learning Outcomes Achievement across Cognitive Domains

Peer reviewed

Direct link

Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024

The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…

Descriptors: Accuracy, Reliability, Computational Linguistics, Standards

The Influence of Rater Effects in Training Sets on the Psychometric Quality of Automated Scoring for Writing Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018

Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…

Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring

Marking Essays on Screen: An Investigation into the Reliability of Marking Extended Subjective Texts

Peer reviewed

Direct link

Johnson, Martin; Nadas, Rita; Bell, John F. – British Journal of Educational Technology, 2010

There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…

Descriptors: English Literature, Examiners, Evaluation Research, Evaluators