Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 4 |
Descriptor
| Computer Assisted Testing | 4 |
| Interrater Reliability | 4 |
| Reliability | 4 |
| Evaluators | 3 |
| Grading | 3 |
| Scoring | 3 |
| Accuracy | 2 |
| Artificial Intelligence | 2 |
| Comparative Analysis | 2 |
| Correlation | 2 |
| Essay Tests | 2 |
| More ▼ | |
Source
| Advances in Physiology… | 1 |
| British Educational Research… | 1 |
| British Journal of… | 1 |
| International Journal of… | 1 |
Author
Publication Type
| Journal Articles | 4 |
| Reports - Research | 4 |
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
Audience
Location
| Singapore | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Jonas Flodén – British Educational Research Journal, 2025
This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…
Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Marking Essays on Screen: An Investigation into the Reliability of Marking Extended Subjective Texts
Johnson, Martin; Nadas, Rita; Bell, John F. – British Journal of Educational Technology, 2010
There is a growing body of research literature that considers how the mode of assessment, either computer-based or paper-based, might affect candidates' performances. Despite this, there is a fairly narrow literature that shifts the focus of attention to those making assessment judgements and which considers issues of assessor consistency when…
Descriptors: English Literature, Examiners, Evaluation Research, Evaluators

Peer reviewed
Direct link
