Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 2 |
| Since 2007 (last 20 years) | 3 |
Descriptor
| Computer Assisted Testing | 3 |
| Natural Language Processing | 3 |
| Scoring | 3 |
| Accuracy | 2 |
| Automation | 2 |
| Educational Assessment | 2 |
| Scoring Rubrics | 2 |
| Artificial Intelligence | 1 |
| Comparative Analysis | 1 |
| Data | 1 |
| Essays | 1 |
| More ▼ | |
Source
| Journal of Educational… | 3 |
Author
| Gyeonggeon Lee | 1 |
| Lottridge, Sue | 1 |
| Luyang Fang | 1 |
| Mark Wilson | 1 |
| Mayfield, Elijah | 1 |
| Mingfeng Xue | 1 |
| Shermis, Mark D. | 1 |
| Xiaoming Zhai | 1 |
| Xingyao Xiao | 1 |
| Yunting Liu | 1 |
Publication Type
| Journal Articles | 3 |
| Reports - Research | 3 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Luyang Fang; Gyeonggeon Lee; Xiaoming Zhai – Journal of Educational Measurement, 2025
Machine learning-based automatic scoring faces challenges with imbalanced student responses across scoring categories. To address this, we introduce a novel text data augmentation framework that leverages GPT-4, a generative large language model specifically tailored for imbalanced datasets in automatic scoring. Our experimental dataset consisted…
Descriptors: Computer Assisted Testing, Artificial Intelligence, Automation, Scoring
Mingfeng Xue; Yunting Liu; Xingyao Xiao; Mark Wilson – Journal of Educational Measurement, 2025
Prompts play a crucial role in eliciting accurate outputs from large language models (LLMs). This study examines the effectiveness of an automatic prompt engineering (APE) framework for automatic scoring in educational measurement. We collected constructed-response data from 930 students across 11 items and used human scores as the true labels. A…
Descriptors: Computer Assisted Testing, Prompting, Educational Assessment, Automation
Shermis, Mark D.; Lottridge, Sue; Mayfield, Elijah – Journal of Educational Measurement, 2015
This study investigated the impact of anonymizing text on predicted scores made by two kinds of automated scoring engines: one that incorporates elements of natural language processing (NLP) and one that does not. Eight data sets (N = 22,029) were used to form both training and test sets in which the scoring engines had access to both text and…
Descriptors: Scoring, Essays, Computer Assisted Testing, Natural Language Processing

Peer reviewed
Direct link
