Publication Date
| In 2026 | 2 |
| Since 2025 | 7 |
| Since 2022 (last 5 years) | 43 |
| Since 2017 (last 10 years) | 107 |
| Since 2007 (last 20 years) | 372 |
Descriptor
| Classification | 372 |
| Computer Assisted Testing | 138 |
| Hypothesis Testing | 93 |
| Foreign Countries | 83 |
| Testing | 72 |
| Statistical Analysis | 67 |
| Accuracy | 60 |
| Models | 58 |
| Scores | 57 |
| Comparative Analysis | 55 |
| Test Items | 50 |
| More ▼ | |
Source
Author
| Wang, Wen-Chung | 4 |
| Fields, Lanny | 3 |
| Kim, Jiseon | 3 |
| Sinharay, Sandip | 3 |
| Thompson, Nathan A. | 3 |
| Alonzo, Julie | 2 |
| Anderson, Daniel | 2 |
| Barnes, Tiffany, Ed. | 2 |
| Bell, Raoul | 2 |
| Buchner, Axel | 2 |
| Cai, Li | 2 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 58 |
| Elementary Education | 43 |
| Elementary Secondary Education | 41 |
| Postsecondary Education | 39 |
| Secondary Education | 36 |
| Grade 4 | 25 |
| Middle Schools | 23 |
| Grade 3 | 22 |
| Grade 8 | 22 |
| Grade 6 | 19 |
| Grade 5 | 18 |
| More ▼ | |
Location
| Australia | 12 |
| China | 8 |
| Germany | 7 |
| United States | 7 |
| New York | 6 |
| North Carolina | 6 |
| Spain | 6 |
| United Kingdom | 6 |
| United Kingdom (England) | 6 |
| California | 5 |
| Canada | 5 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 5 |
| Every Student Succeeds Act… | 1 |
| Individuals with Disabilities… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Amanda A. Wolkowitz; Russell Smith – Practical Assessment, Research & Evaluation, 2024
A decision consistency (DC) index is an estimate of the consistency of a classification decision on an exam. More specifically, DC estimates the percentage of examinees that would have the same classification decision on an exam if they were to retake the same or a parallel form of the exam again without memory of taking the exam the first time.…
Descriptors: Testing, Test Reliability, Replication (Evaluation), Decision Making
Jing Ma – ProQuest LLC, 2024
This study investigated the impact of scoring polytomous items later on measurement precision, classification accuracy, and test security in mixed-format adaptive testing. Utilizing the shadow test approach, a simulation study was conducted across various test designs, lengths, number and location of polytomous item. Results showed that while…
Descriptors: Scoring, Adaptive Testing, Test Items, Classification
V. N. Vimal Rao; Jeffrey K. Bye; Sashank Varma – Cognitive Research: Principles and Implications, 2024
The 0.05 boundary within Null Hypothesis Statistical Testing (NHST) "has made a lot of people very angry and been widely regarded as a bad move" (to quote Douglas Adams). Here, we move past meta-scientific arguments and ask an empirical question: What is the psychological standing of the 0.05 boundary for statistical significance? We…
Descriptors: Psychological Patterns, Statistical Analysis, Testing, Statistical Significance
Coggeshall, Whitney Smiley – Educational Measurement: Issues and Practice, 2021
The continuous testing framework, where both successful and unsuccessful examinees have to demonstrate continued proficiency at frequent prespecified intervals, is a framework that is used in noncognitive assessment and is gaining in popularity in cognitive assessment. Despite the rigorous advantages of this framework, this paper demonstrates that…
Descriptors: Classification, Accuracy, Testing, Failure
Kang, Yewon; Ha, Hyorim; Lee, Hee Seung – Educational Psychology Review, 2023
Natural category learning is important in science education. One strategy that has been empirically supported for enhancing category learning is testing, which facilitates not only the learning of previously studied information (backward testing effect) but also the learning of newly studied information (forward testing effect). However, in…
Descriptors: Science Education, Science Tests, Testing, Classification
Kayla V. Campaña; Benjamin G. Solomon – Assessment for Effective Intervention, 2025
The purpose of this study was to compare the classification accuracy of data produced by the previous year's end-of-year New York state assessment, a computer-adaptive diagnostic assessment ("i-Ready"), and the gating combination of both assessments to predict the rate of students passing the following year's end-of-year state assessment…
Descriptors: Accuracy, Classification, Diagnostic Tests, Adaptive Testing
Sinharay, Sandip – Educational and Psychological Measurement, 2022
Administrative problems such as computer malfunction and power outage occasionally lead to missing item scores and hence to incomplete data on mastery tests such as the AP and U.S. Medical Licensing examinations. Investigators are often interested in estimating the probabilities of passing of the examinees with incomplete data on mastery tests.…
Descriptors: Mastery Tests, Computer Assisted Testing, Probability, Test Wiseness
Peter Baldwin; Victoria Yaneva; Kai North; Le An Ha; Yiyun Zhou; Alex J. Mechaber; Brian E. Clauser – Journal of Educational Measurement, 2025
Recent developments in the use of large-language models have led to substantial improvements in the accuracy of content-based automated scoring of free-text responses. The reported accuracy levels suggest that automated systems could have widespread applicability in assessment. However, before they are used in operational testing, other aspects of…
Descriptors: Artificial Intelligence, Scoring, Computational Linguistics, Accuracy
Ramsey Lee Cardwell – ProQuest LLC, 2022
The emergence of digital-first assessments is prompting reconsideration of, and innovation in, aspects of psychometrics, test validation, and test use. Using the Duolingo English Test (DET) as an example, this three-paper series seeks to address issues concerning the estimation of classification consistency and the reporting of results for such…
Descriptors: Classification, Reliability, Language Proficiency, Computer Assisted Testing
Putnikovic, Marko; Jovanovic, Jelena – IEEE Transactions on Learning Technologies, 2023
Automatic grading of short answers is an important task in computer-assisted assessment (CAA). Recently, embeddings, as semantic-rich textual representations, have been increasingly used to represent short answers and predict the grade. Despite the recent trend of applying embeddings in automatic short answer grading (ASAG), there are no…
Descriptors: Automation, Computer Assisted Testing, Grading, Natural Language Processing
Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023
Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…
Descriptors: Testing, Computation, Classification, Accuracy
Lim, Hwanggyu; Davey, Tim; Wells, Craig S. – Journal of Educational Measurement, 2021
This study proposed a recursion-based analytical approach to assess measurement precision of ability estimation and classification accuracy in multistage adaptive tests (MSTs). A simulation study was conducted to compare the proposed recursion-based analytical method with an analytical method proposed by Park, Kim, Chung, and Dodd and with the…
Descriptors: Adaptive Testing, Measurement, Accuracy, Classification
Jonathan Liu; Seth Poulsen; Erica Goodwin; Hongxuan Chen; Grace Williams; Yael Gertner; Diana Franklin – ACM Transactions on Computing Education, 2025
Algorithm design is a vital skill developed in most undergraduate Computer Science (CS) programs, but few research studies focus on pedagogy related to algorithms coursework. To understand the work that has been done in the area, we present a systematic survey and literature review of CS Education studies. We search for research that is both…
Descriptors: Teaching Methods, Algorithms, Design, Computer Science Education
Daniel Corral; Shana K. Carpenter – Cognitive Research: Principles and Implications, 2024
We report six experiments that examine how two essential components of a category-learning paradigm, training and feedback, can be manipulated to maximize learning and transfer of real-world, complex concepts. Some subjects learned through classification and were asked to classify hypothetical experiment scenarios as either true or non-true…
Descriptors: Concept Formation, Teaching Methods, Observational Learning, Classification
Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025
Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…
Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment

Peer reviewed
Direct link
