ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	9

Descriptor

Computation	10
Evaluation Methods	10
Item Response Theory	6
Accuracy	3
Classification	3
Monte Carlo Methods	3
Scores	3
Decision Making	2
Markov Processes	2
Measurement Techniques	2
Models	2
Simulation	2
Test Bias	2
Test Length	2
Ability	1
Alternative Assessment	1
Cognitive Measurement	1
Comparative Analysis	1
Computer Simulation	1
Computer Software	1
Credentials	1
Cutting Scores	1
Data Collection	1
Difficulty Level	1
Error Patterns	1
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	10
Reports - Research	6
Reports - Descriptive	2
Reports - Evaluative	2

Education Level

Grade 4

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Estimating Classification Accuracy and Consistency Indices for Multiple Measures with the Simple Structure MIRT Model

Peer reviewed

Direct link

Park, Seohee; Kim, Kyung Yong; Lee, Won-Chan – Journal of Educational Measurement, 2023

Multiple measures, such as multiple content domains or multiple types of performance, are used in various testing programs to classify examinees for screening or selection. Despite the popular usages of multiple measures, there is little research on classification consistency and accuracy of multiple measures. Accordingly, this study introduces an…

Descriptors: Testing, Computation, Classification, Accuracy

An Exponentially Weighted Moving Average Procedure for Detecting Back Random Responding Behavior

Peer reviewed

Direct link

He, Yinhong – Journal of Educational Measurement, 2023

Back random responding (BRR) behavior is one of the commonly observed careless response behaviors. Accurately detecting BRR behavior can improve test validities. Yu and Cheng (2019) showed that the change point analysis (CPA) procedure based on weighted residual (CPA-WR) performed well in detecting BRR. Compared with the CPA procedure, the…

Descriptors: Test Validity, Item Response Theory, Measurement, Monte Carlo Methods

A Computationally Simple Method for Estimating Decision Consistency

Peer reviewed

Direct link

Wolkowitz, Amanda A. – Journal of Educational Measurement, 2021

Decision consistency (DC) is the reliability of a classification decision based on a test score. In professional credentialing, the decision is often a high-stakes pass/fail decision. The current methods for estimating DC are computationally complex. The purpose of this research is to provide a computationally and conceptually simple method for…

Descriptors: Decision Making, Reliability, Classification, Scores

Estimating the Accuracy of Relative Growth Measures Using Empirical Data

Peer reviewed

Direct link

Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020

The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…

Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods

The Effects of Incomplete Rating Designs in Combination with Rater Effects

Peer reviewed

Direct link

Wind, Stefanie A.; Jones, Eli – Journal of Educational Measurement, 2019

Researchers have explored a variety of topics related to identifying and distinguishing among specific types of rater effects, as well as the implications of different types of incomplete data collection designs for rater-mediated assessments. In this study, we used simulated data to examine the sensitivity of latent trait model indicators of…

Descriptors: Rating Scales, Models, Evaluators, Data Collection

Assessment of Differential Item Functioning under Cognitive Diagnosis Models: The DINA Model Example

Peer reviewed

Direct link

Li, Xiaomin; Wang, Wen-Chung – Journal of Educational Measurement, 2015

The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are…

Descriptors: Test Bias, Models, Cognitive Measurement, Evaluation Methods

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

Peer reviewed

Direct link

Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics

Summarizing Item Difficulty Variation with Parcel Scores

Peer reviewed

Direct link

Camilli, Gregory; Prowker, Adam; Dossey, John A.; Lindquist, Mary M.; Chiu, Ting-Wei; Vargas, Sadako; de la Torre, Jimmy – Journal of Educational Measurement, 2008

A new method for analyzing differential item functioning is proposed to investigate the relative strengths and weaknesses of multiple groups of examinees. Accordingly, the notion of a conditional measure of difference between two groups (Reference and Focal) is generalized to a conditional variance. The objective of this article is to present and…

Descriptors: Test Bias, National Competency Tests, Grade 4, Difficulty Level

A Comparative Study of IRT Fixed Parameter Calibration Methods

Peer reviewed

Direct link

Kim, Seonghoon – Journal of Educational Measurement, 2006

This article provides technical descriptions of five fixed parameter calibration (FPC) methods, which were based on marginal maximum likelihood estimation via the EM algorithm, and evaluates them through simulation. The five FPC methods described are distinguished from each other by how many times they update the prior ability distribution and by…

Descriptors: Comparative Analysis, Item Response Theory, Evaluation Methods, Computation

Generalizability in Item Response Modeling

Peer reviewed

Direct link

Briggs, Derek C.; Wilson, Mark – Journal of Educational Measurement, 2007

An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random…

Descriptors: Markov Processes, Generalizability Theory, Item Response Theory, Computation

Briggs, Derek C.	1
Camilli, Gregory	1
Castellano, Katherine E.	1
Cheng, Ying	1
Chiu, Ting-Wei	1
Dossey, John A.	1
He, Yinhong	1
Jones, Eli	1
Kim, Kyung Yong	1
Kim, Seonghoon	1
Lathrop, Quinn N.	1
Lee, Won-Chan	1
Li, Xiaomin	1
Lindquist, Mary M.	1
McCaffrey, Daniel F.	1
Park, Seohee	1
Prowker, Adam	1
Vargas, Sadako	1
Wang, Wen-Chung	1
Wilson, Mark	1
Wind, Stefanie A.	1
Wolkowitz, Amanda A.	1
de la Torre, Jimmy	1
More ▼