ERIC - Search Results

Publication Date

In 2025	0
Since 2024	2
Since 2021 (last 5 years)	3
Since 2016 (last 10 years)	9
Since 2006 (last 20 years)	19

Descriptor

Monte Carlo Methods	23
Item Response Theory	14
Comparative Analysis	9
Test Items	9
Simulation	7
Error of Measurement	6
Evaluation Methods	5
Accuracy	4
Computation	4
Regression (Statistics)	4
Reliability	4
Scores	4
Test Bias	4
Correlation	3
Data Analysis	3
Goodness of Fit	3
Maximum Likelihood Statistics	3
Measurement	3
Models	3
Nonparametric Statistics	3
Scaling	3
Statistical Analysis	3
Bayesian Statistics	2
Bias	2
Classification	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	23
Reports - Research	19
Reports - Evaluative	4

Education Level

Elementary Education	4
Early Childhood Education	2
Grade 1	2
Grade 2	2
Grade 3	2
Grade 8	2
Junior High Schools	2
Middle Schools	2
Primary Education	2
Secondary Education	2
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 9	1
Intermediate Grades	1
More ▼

Audience

Researchers

Location

Tennessee	2
Colorado	1
Florida	1
New York	1
North Carolina	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…	1
Program for International…	1
Progress in International…	1
Stanford Achievement Tests	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 23 results Save | Export

Multi-Group Generalizations of SIBTEST and Crossing-SIBTEST

Peer reviewed

Direct link

Chalmers, R. Philip; Zheng, Guoguo – Applied Measurement in Education, 2023

This article presents generalizations of SIBTEST and crossing-SIBTEST statistics for differential item functioning (DIF) investigations involving more than two groups. After reviewing the original two-group setup for these statistics, a set of multigroup generalizations that support contrast matrices for joint tests of DIF are presented. To…

Descriptors: Test Bias, Test Items, Item Response Theory, Error of Measurement

IRT Characteristic Curve Linking Methods Weighted by Information for Mixed-Format Tests

Peer reviewed

Direct link

Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024

To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…

Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

A Comparison of Estimation Techniques for IRT Models with Small Samples

Peer reviewed

Direct link

Finch, Holmes; French, Brian F. – Applied Measurement in Education, 2019

The usefulness of item response theory (IRT) models depends, in large part, on the accuracy of item and person parameter estimates. For the standard 3 parameter logistic model, for example, these parameters include the item parameters of difficulty, discrimination, and pseudo-chance, as well as the person ability parameter. Several factors impact…

Descriptors: Item Response Theory, Accuracy, Test Items, Difficulty Level

Detection of Differential Item Functioning for More than Two Groups: A Monte Carlo Comparison of Methods

Peer reviewed

Direct link

Finch, W. Holmes – Applied Measurement in Education, 2016

Differential item functioning (DIF) assessment is a crucial component in test construction, serving as the primary way in which instrument developers ensure that measures perform in the same way for multiple groups within the population. When such is not the case, scores may not accurately reflect the trait of interest for all individuals in the…

Descriptors: Test Bias, Monte Carlo Methods, Comparative Analysis, Population Groups

Using the Bayes Factors to Evaluate Person Fit in the Item Response Theory

Peer reviewed

Direct link

Pan, Tianshu; Yin, Yue – Applied Measurement in Education, 2017

In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach,…

Descriptors: Bayesian Statistics, Goodness of Fit, Item Response Theory, Monte Carlo Methods

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test

Peer reviewed

Direct link

Liang, Tie; Wells, Craig S. – Applied Measurement in Education, 2015

Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…

Descriptors: Nonparametric Statistics, Goodness of Fit, Item Response Theory, Test Format

Stability of Teacher Value-Added Rankings across Measurement Model and Scaling Conditions

Peer reviewed

Direct link

Hawley, Leslie R.; Bovaird, James A.; Wu, ChaoRong – Applied Measurement in Education, 2017

Value-added assessment methods have been criticized by researchers and policy makers for a number of reasons. One issue includes the sensitivity of model results across different outcome measures. This study examined the utility of incorporating multivariate latent variable approaches within a traditional value-added framework. We evaluated the…

Descriptors: Value Added Models, Reliability, Multivariate Analysis, Scaling

Parameter Recovery and Classification Accuracy under Conditions of Testlet Dependency: A Comparison of the Traditional 2PL, Testlet, and Bi-Factor Models

Peer reviewed

Direct link

Koziol, Natalie A. – Applied Measurement in Education, 2016

Testlets, or groups of related items, are commonly included in educational assessments due to their many logistical and conceptual advantages. Despite their advantages, testlets introduce complications into the theory and practice of educational measurement. Responses to items within a testlet tend to be correlated even after controlling for…

Descriptors: Classification, Accuracy, Comparative Analysis, Models

Sensitivity of Achievement Estimation to Conditioning Model Misclassification

Peer reviewed

Direct link

Rutkowski, Leslie – Applied Measurement in Education, 2014

Large-scale assessment programs such as the National Assessment of Educational Progress (NAEP), Trends in International Mathematics and Science Study (TIMSS), and Programme for International Student Assessment (PISA) use a sophisticated assessment administration design called matrix sampling that minimizes the testing burden on individual…

Descriptors: Measurement, Testing, Item Sampling, Computation

A Comparison of Teacher Effectiveness Measures Calculated Using Three Multilevel Models for Raters Effects

Peer reviewed

Direct link

Murphy, Daniel L.; Beretvas, S. Natasha – Applied Measurement in Education, 2015

This study examines the use of cross-classified random effects models (CCrem) and cross-classified multiple membership random effects models (CCMMrem) to model rater bias and estimate teacher effectiveness. Effect estimates are compared using CTT versus item response theory (IRT) scaling methods and three models (i.e., conventional multilevel…

Descriptors: Teacher Effectiveness, Comparative Analysis, Hierarchical Linear Modeling, Test Theory

Innovations in Measuring Rater Accuracy in Standard Setting: Assessing "Fit" to Item Characteristic Curves

Peer reviewed

Direct link

Hurtz, Gregory M.; Jones, J. Patrick – Applied Measurement in Education, 2009

Standard setting methods such as the Angoff method rely on judgments of item characteristics; item response theory empirically estimates item characteristics and displays them in item characteristic curves (ICCs). This study evaluated several indexes of rater fit to ICCs as a method for judging rater accuracy in their estimates of expected item…

Descriptors: Standard Setting (Scoring), Item Response Theory, Reliability, Measurement

Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

Peer reviewed

Direct link

Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick – Applied Measurement in Education, 2008

A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…

Descriptors: Test Items, Factor Analysis, Item Response Theory, Comparative Analysis

The Effects of the Number of Scale Points and Non-Normality on the Generalizability Coefficient: A Monte Carlo Study

Peer reviewed

Direct link

Shumate, Steven R.; Surles, James; Johnson, Robert L.; Penny, Jim – Applied Measurement in Education, 2007

Increasingly, assessment practitioners use generalizability coefficients to estimate the reliability of scores from performance tasks. Little research, however, examines the relation between the estimation of generalizability coefficients and the number of rubric scale points and score distributions. The purpose of the present research is to…

Descriptors: Generalizability Theory, Monte Carlo Methods, Measures (Individuals), Program Effectiveness

Previous Page | Next Page »

Pages: 1 | 2

Finch, Holmes	3
Bolt, Daniel M.	2
Monahan, Patrick	2
Wells, Craig S.	2
Beretvas, S. Natasha	1
Bovaird, James A.	1
Chalmers, R. Philip	1
Cho, Sun-Joo	1
Finch, W. Holmes	1
French, Brian F.	1
Hawley, Leslie R.	1
Hong, Sehee	1
Hurtz, Gregory M.	1
Ito, Kyoko	1
James S. Kim	1
Johnson, Robert L.	1
Jones, J. Patrick	1
Joshua B. Gilbert	1
Koziol, Natalie A.	1
Lee, Wooyeol	1
Liang, Tie	1
Lixin Yuan	1
Luke W. Miratrix	1
Minqiang Zhang	1
Murphy, Daniel L.	1
More ▼