ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	7
Since 2007 (last 20 years)	17

Descriptor

Evaluation Methods	26
Item Response Theory	23
Test Items	12
Equated Scores	7
Simulation	7
Error of Measurement	5
Measurement Techniques	5
Comparative Analysis	4
Research Methodology	4
Sample Size	4
College Students	3
Error Patterns	3
Measurement	3
Monte Carlo Methods	3
Psychometrics	3
Sampling	3
Standardized Tests	3
Statistical Analysis	3
Test Construction	3
Testing Programs	3
Achievement Tests	2
Computation	2
Cutting Scores	2
Data Analysis	2
Data Collection	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	26
Reports - Research	17
Reports - Evaluative	9
Information Analyses	1

Education Level

Higher Education	2
Grade 10	1
Grade 11	1
High Schools	1
Postsecondary Education	1
Secondary Education	1

Audience

Researchers

Location

Florida	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Florida Comprehensive…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 26 results Save | Export

Combining Nonparametric and Parametric Item Response Theory to Explore Data Quality: Illustrations and a Simulation Study

Peer reviewed

Direct link

Stefanie A. Wind; Benjamin Lugu – Applied Measurement in Education, 2024

Researchers who use measurement models for evaluation purposes often select models with stringent requirements, such as Rasch models, which are parametric. Mokken Scale Analysis (MSA) offers a theory-driven nonparametric modeling approach that may be more appropriate for some measurement applications. Researchers have discussed using MSA as a…

Descriptors: Item Response Theory, Data Analysis, Simulation, Nonparametric Statistics

Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data

Peer reviewed

Direct link

Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024

Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…

Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests

Validating Rubric Scoring Processes: An Application of an Item Response Tree Model

Peer reviewed

Direct link

Myers, Aaron J.; Ames, Allison J.; Leventhal, Brian C.; Holzman, Madison A. – Applied Measurement in Education, 2020

When rating performance assessments, raters may ascribe different scores for the same performance when rubric application does not align with the intended application of the scoring criteria. Given performance assessment score interpretation assumes raters apply rubrics as rubric developers intended, misalignment between raters' scoring processes…

Descriptors: Scoring Rubrics, Validity, Item Response Theory, Interrater Reliability

Some Methods and Evaluation for Linking and Equating with Small Samples

Peer reviewed

Direct link

Peabody, Michael R. – Applied Measurement in Education, 2020

The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…

Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Identifying Disengaged Survey Responses: New Evidence Using Response Time Metadata

Peer reviewed

Direct link

Soland, James; Wise, Steven L.; Gao, Lingyun – Applied Measurement in Education, 2019

Disengaged responding is a phenomenon that often biases observed scores from achievement tests and surveys in practically and statistically significant ways. This problem has led to the development of methods to detect and correct for disengaged responses on both achievement test and survey scores. One major disadvantage when trying to detect…

Descriptors: Reaction Time, Metadata, Response Style (Tests), Student Surveys

Using Testlet Response Theory to Examine Local Dependence in C-Tests

Peer reviewed

Direct link

Eckes, Thomas; Baghaei, Purya – Applied Measurement in Education, 2015

C-tests are gap-filling tests widely used to assess general language proficiency for purposes of placement, screening, or provision of feedback to language learners. C-tests consist of several short texts in which parts of words are missing. We addressed the issue of local dependence in C-tests using an explicit modeling approach based on testlet…

Descriptors: Language Proficiency, Language Tests, Item Response Theory, Test Reliability

Is Teacher Value Added a Matter of Scale? The Practical Consequences of Treating an Ordinal Scale as Interval for Estimation of Teacher Effects

Peer reviewed

Direct link

Soland, James – Applied Measurement in Education, 2017

Research shows that assuming a test scale is equal-interval can be problematic, especially when the assessment is being used to achieve a policy aim like evaluating growth over time. However, little research considers whether teacher value added is sensitive to the underlying test scale, and in particular whether treating an ordinal scale as…

Descriptors: Intervals, Value Added Models, Teacher Evaluation, Teacher Effectiveness

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Two Approaches for Identifying Low-Motivated Students in a Low-Stakes Assessment Context

Peer reviewed

Direct link

Swerdzewski, Peter J.; Harmes, J. Christine; Finney, Sara J. – Applied Measurement in Education, 2011

Many universities rely on data gathered from tests that are low stakes for examinees but high stakes for the various programs being assessed. Given the lack of consequences associated with many collegiate assessments, the construct-irrelevant variance introduced by unmotivated students is potentially a serious threat to the validity of the…

Descriptors: Computer Assisted Testing, Student Motivation, Inferences, Universities

A Comparison of IRT Linking Procedures

Peer reviewed

Direct link

Lee, Won-Chan; Ban, Jae-Chun – Applied Measurement in Education, 2010

Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with…

Descriptors: Item Response Theory, Simulation, Comparative Analysis, Measurement Techniques

Comparison of Factor Simplicity Indices for Dichotomous Data: DETECT R, Bentler's Simplicity Index, and the Loading Simplicity Index

Peer reviewed

Direct link

Finch, Holmes; Stage, Alan Kirk; Monahan, Patrick – Applied Measurement in Education, 2008

A primary assumption underlying several of the common methods for modeling item response data is unidimensionality, that is, test items tap into only one latent trait. This assumption can be assessed several ways, using nonlinear factor analysis and DETECT, a method based on the item conditional covariances. When multidimensionality is identified,…

Descriptors: Test Items, Factor Analysis, Item Response Theory, Comparative Analysis

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

Creating IRT-Based Parallel Test Forms Using the Genetic Algorithm Method

Peer reviewed

Direct link

Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008

In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…

Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

Previous Page | Next Page »

Pages: 1 | 2

Finch, Holmes	2
Monahan, Patrick	2
Soland, James	2
Ackerman, Terry A.	1
Ames, Allison J.	1
Baghaei, Purya	1
Ban, Jae-Chun	1
Benjamin Lugu	1
Berberoglu, Giray	1
Bolt, Daniel M.	1
Brian F. French	1
Briggs, Derek C.	1
Chen, Yu-Jen	1
Cheng, Chien-Fen	1
Clauser, Brian	1
Crouse, Jill D.	1
Dorans, Neil J.	1
Dwyer, Andrew C.	1
Eckes, Thomas	1
Finney, Sara J.	1
Furter, Robert T.	1
Gao, Lingyun	1
Goldberg, Gail Lynn	1
Harmes, J. Christine	1
More ▼