ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	8
Since 2017 (last 10 years)	12
Since 2007 (last 20 years)	23

Descriptor

Simulation	33
Item Response Theory	31
Test Items	19
Comparative Analysis	10
Sample Size	8
Error of Measurement	7
Evaluation Methods	7
Equated Scores	6
Accuracy	5
Computation	5
Monte Carlo Methods	5
Nonparametric Statistics	5
Scaling	5
Scores	5
Test Length	5
Data Analysis	4
Difficulty Level	4
Educational Assessment	4
Evaluation Criteria	4
Test Bias	4
Achievement Tests	3
Classification	3
Goodness of Fit	3
Maximum Likelihood Statistics	3
Measurement Techniques	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	33
Reports - Research	24
Reports - Evaluative	9
Speeches/Meeting Papers	1

Education Level

Elementary Education	2
Grade 3	2
Early Childhood Education	1
Elementary Secondary Education	1
Grade 1	1
Grade 2	1
Grade 4	1
Grade 5	1
Grade 6	1
Grade 7	1
Grade 8	1
Kindergarten	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Researchers

Location

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
International Adult Literacy…	1
Program for International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 33 results Save | Export

The Impact of Non-Effortful Responding on Item and Person Parameters in Item-Pool Scaling Linking

Peer reviewed

Direct link

Yue Liu; Zhen Li; Hongyun Liu; Xiaofeng You – Applied Measurement in Education, 2024

Low test-taking effort of examinees has been considered a source of construct-irrelevant variance in item response modeling, leading to serious consequences on parameter estimation. This study aims to investigate how non-effortful response (NER) influences the estimation of item and person parameters in item-pool scale linking (IPSL) and whether…

Descriptors: Item Response Theory, Computation, Simulation, Responses

Impact of Violating Unidimensionality on Rasch Calibration for Mixed-Format Tests

Peer reviewed

Direct link

Chunyan Liu; Raja Subhiyah; Richard A. Feinberg – Applied Measurement in Education, 2024

Mixed-format tests that include both multiple-choice (MC) and constructed-response (CR) items have become widely used in many large-scale assessments. When an item response theory (IRT) model is used to score a mixed-format test, the unidimensionality assumption may be violated if the CR items measure a different construct from that measured by MC…

Descriptors: Test Format, Response Style (Tests), Multiple Choice Tests, Item Response Theory

Combining Nonparametric and Parametric Item Response Theory to Explore Data Quality: Illustrations and a Simulation Study

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu – Applied Measurement in Education, 2024

Researchers who use measurement models for evaluation purposes often select models with stringent requirements, such as Rasch models, which are parametric. Mokken Scale Analysis (MSA) offers a theory-driven nonparametric modeling approach that may be more appropriate for some measurement applications. Researchers have discussed using MSA as a…

Descriptors: Item Response Theory, Data Analysis, Simulation, Nonparametric Statistics

A Method of Empirical Q-Matrix Validation for Multidimensional Item Response Theory

Peer reviewed

Direct link

Marcelo Andrade da Silva; A. Corinne Huggins-Manley; Jorge Luis Bazán; Amber Benedict – Applied Measurement in Education, 2024

A Q-matrix is a binary matrix that defines the relationship between items and latent variables and is widely used in diagnostic classification models (DCMs), and can also be adopted in multidimensional item response theory (MIRT) models. The construction process of the Q-matrix is typically carried out by experts in the subject area of the items…

Descriptors: Q Methodology, Matrices, Item Response Theory, Educational Assessment

Comparison of Methods for Identifying Differential Step Functioning with Polytomous Item Response Data

Peer reviewed

Direct link

Finch, Holmes – Applied Measurement in Education, 2022

Much research has been devoted to identification of differential item functioning (DIF), which occurs when the item responses for individuals from two groups differ after they are conditioned on the latent trait being measured by the scale. There has been less work examining differential step functioning (DSF), which is present for polytomous…

Descriptors: Comparative Analysis, Item Response Theory, Item Analysis, Simulation

Maintaining Score Scales over Time: A Comparison of Five Scoring Methods

Peer reviewed

Direct link

Kim, Stella Yun; Lee, Won-Chan – Applied Measurement in Education, 2023

This study evaluates various scoring methods including number-correct scoring, IRT theta scoring, and hybrid scoring in terms of scale-score stability over time. A simulation study was conducted to examine the relative performance of five scoring methods in terms of preserving the first two moments of scale scores for a population in a chain of…

Descriptors: Scoring, Comparative Analysis, Item Response Theory, Simulation

Leveraging Item Parameter Drift to Assess Transfer Effects in Vocabulary Learning

Peer reviewed

Direct link

Joshua B. Gilbert; James S. Kim; Luke W. Miratrix – Applied Measurement in Education, 2024

Longitudinal models typically emphasize between-person predictors of change but ignore how growth varies "within" persons because each person contributes only one data point at each time. In contrast, modeling growth with multi-item assessments allows evaluation of how relative item performance may shift over time. While traditionally…

Descriptors: Vocabulary Development, Item Response Theory, Test Items, Student Development

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing

Peer reviewed

Direct link

Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022

When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…

Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis

IRT Item Parameter Scaling for Developing New Item Pools

Peer reviewed

Direct link

Kang, Hyeon-Ah; Lu, Ying; Chang, Hua-Hua – Applied Measurement in Education, 2017

Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent…

Descriptors: Item Response Theory, Accuracy, Educational Assessment, Test Items

Are the Nonparametric Person-Fit Statistics More Powerful than Their Parametric Counterparts? Revisiting the Simulations in Karabatsos (2003)

Peer reviewed

Direct link

Sinharay, Sandip – Applied Measurement in Education, 2017

Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…

Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

Bi-Factor MIRT Observed-Score Equating for Mixed-Format Tests

Peer reviewed

Direct link

Lee, Guemin; Lee, Won-Chan – Applied Measurement in Education, 2016

The main purposes of this study were to develop bi-factor multidimensional item response theory (BF-MIRT) observed-score equating procedures for mixed-format tests and to investigate relative appropriateness of the proposed procedures. Using data from a large-scale testing program, three types of pseudo data sets were formulated: matched samples,…

Descriptors: Test Format, Multidimensional Scaling, Item Response Theory, Equated Scores

Effects of Population Heterogeneity on Accuracy of DIF Detection

Peer reviewed

Direct link

Oliveri, María Elena; Ercikan, Kadriye; Zumbo, Bruno D. – Applied Measurement in Education, 2014

Heterogeneity within English language learners (ELLs) groups has been documented. Previous research on differential item functioning (DIF) analyses suggests that accurate DIF detection rates are reduced greatly when groups are heterogeneous. In this simulation study, we investigated the effects of heterogeneity within linguistic (ELL) groups on…

Descriptors: Test Bias, Accuracy, English Language Learners, Simulation

The Effect of Anchor Test Construction on Scale Drift

Peer reviewed

Direct link

Antal, Judit; Proctor, Thomas P.; Melican, Gerald J. – Applied Measurement in Education, 2014

In common-item equating the anchor block is generally built to represent a miniature form of the total test in terms of content and statistical specifications. The statistical properties frequently reflect equal mean and spread of item difficulty. Sinharay and Holland (2007) suggested that the requirement for equal spread of difficulty may be too…

Descriptors: Test Items, Equated Scores, Difficulty Level, Item Response Theory

Previous Page | Next Page »

Pages: 1 | 2 | 3

Lee, Won-Chan	3
Bolt, Daniel M.	2
Finch, Holmes	2
Wells, Craig S.	2
A. Corinne Huggins-Manley	1
Abulela, Mohammed A. A.	1
Allen, Nancy	1
Amber Benedict	1
Antal, Judit	1
Ban, Jae-Chun	1
Benjamin Lugu	1
Chang, Hua-Hua	1
Cho, Sun-Joo	1
Chuah, Siang Chee	1
Chunyan Liu	1
Clauser, Brian	1
Custer, Michael	1
De Champlain, Andre	1
Drasgow, Fritz	1
Eignor, Daniel R.	1
Ercikan, Kadriye	1
Fitzpatrick, Anne R.	1
Gessaroli, Marc E.	1
Goodman, Joshua T.	1
More ▼