Publication Date
| In 2026 | 0 |
| Since 2025 | 6 |
| Since 2022 (last 5 years) | 7 |
| Since 2017 (last 10 years) | 9 |
| Since 2007 (last 20 years) | 9 |
Descriptor
| Accuracy | 9 |
| Computer Assisted Testing | 9 |
| Adaptive Testing | 6 |
| Test Items | 6 |
| Scoring | 3 |
| Artificial Intelligence | 2 |
| Automation | 2 |
| Educational Assessment | 2 |
| Error Patterns | 2 |
| Item Banks | 2 |
| Item Response Theory | 2 |
| More ▼ | |
Source
| Journal of Educational… | 9 |
Author
| Alex J. Mechaber | 1 |
| Ben Van Dusen | 1 |
| Brian E. Clauser | 1 |
| Chen, Chia-Wen | 1 |
| Chen, Hanwei | 1 |
| Chiu, Ming Ming | 1 |
| Choe, Edison M. | 1 |
| Cui, Zhongmin | 1 |
| Gyeonggeon Lee | 1 |
| He, Yong | 1 |
| Hua Hua Chang | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 9 |
| Reports - Research | 7 |
| Reports - Evaluative | 2 |
Education Level
| Secondary Education | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 1 |
What Works Clearinghouse Rating
Luyang Fang; Gyeonggeon Lee; Xiaoming Zhai – Journal of Educational Measurement, 2025
Machine learning-based automatic scoring faces challenges with imbalanced student responses across scoring categories. To address this, we introduce a novel text data augmentation framework that leverages GPT-4, a generative large language model specifically tailored for imbalanced datasets in automatic scoring. Our experimental dataset consisted…
Descriptors: Computer Assisted Testing, Artificial Intelligence, Automation, Scoring
Jing Huang; Yuxiao Zhang; Jason W. Morphew; Jayson M. Nissen; Ben Van Dusen; Hua Hua Chang – Journal of Educational Measurement, 2025
Online calibration estimates new item parameters alongside previously calibrated items, supporting efficient item replenishment. However, most existing online calibration procedures for Cognitive Diagnostic Computerized Adaptive Testing (CD-CAT) lack mechanisms to ensure content balance during live testing. This limitation can lead to uneven…
Descriptors: Adaptive Testing, Computer Assisted Testing, Cognitive Measurement, Test Items
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Kylie Gorney; Mark D. Reckase – Journal of Educational Measurement, 2025
In computerized adaptive testing, item exposure control methods are often used to provide a more balanced usage of the item pool. Many of the most popular methods, including the restricted method (Revuelta and Ponsoda), use a single maximum exposure rate to limit the proportion of times that each item is administered. However, Barrada et al.…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Item Banks
Mingfeng Xue; Yunting Liu; Xingyao Xiao; Mark Wilson – Journal of Educational Measurement, 2025
Prompts play a crucial role in eliciting accurate outputs from large language models (LLMs). This study examines the effectiveness of an automatic prompt engineering (APE) framework for automatic scoring in educational measurement. We collected constructed-response data from 930 students across 11 items and used human scores as the true labels. A…
Descriptors: Computer Assisted Testing, Prompting, Educational Assessment, Automation
Lim, Hwanggyu; Choe, Edison M. – Journal of Educational Measurement, 2023
The residual differential item functioning (RDIF) detection framework was developed recently under a linear testing context. To explore the potential application of this framework to computerized adaptive testing (CAT), the present study investigated the utility of the RDIF[subscript R] statistic both as an index for detecting uniform DIF of…
Descriptors: Test Items, Computer Assisted Testing, Item Response Theory, Adaptive Testing
Xiuxiu Tang; Yi Zheng; Tong Wu; Kit-Tai Hau; Hua-Hua Chang – Journal of Educational Measurement, 2025
Multistage adaptive testing (MST) has been recently adopted for international large-scale assessments such as Programme for International Student Assessment (PISA). MST offers improved measurement efficiency over traditional nonadaptive tests and improved practical convenience over single-item-adaptive computerized adaptive testing (CAT). As a…
Descriptors: Reaction Time, Test Items, Achievement Tests, Foreign Countries
Chen, Chia-Wen; Wang, Wen-Chung; Chiu, Ming Ming; Ro, Sage – Journal of Educational Measurement, 2020
The use of computerized adaptive testing algorithms for ranking items (e.g., college preferences, career choices) involves two major challenges: unacceptably high computation times (selecting from a large item pool with many dimensions) and biased results (enhanced preferences or intensified examinee responses because of repeated statements across…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection
Cui, Zhongmin; Liu, Chunyan; He, Yong; Chen, Hanwei – Journal of Educational Measurement, 2018
Allowing item review in computerized adaptive testing (CAT) is getting more attention in the educational measurement field as more and more testing programs adopt CAT. The research literature has shown that allowing item review in an educational test could result in more accurate estimates of examinees' abilities. The practice of item review in…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Test Wiseness

Peer reviewed
Direct link
