NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 18 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Chalmers, R. Philip – Journal of Educational Measurement, 2023
Several marginal effect size (ES) statistics suitable for quantifying the magnitude of differential item functioning (DIF) have been proposed in the area of item response theory; for instance, the Differential Functioning of Items and Tests (DFIT) statistics, signed and unsigned item difference in the sample statistics (SIDS, UIDS, NSIDS, and…
Descriptors: Test Bias, Item Response Theory, Definitions, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Haberman, Shelby J. – Journal of Educational Measurement, 2020
Examples of the impact of statistical theory on assessment practice are provided from the perspective of a statistician trained in theoretical statistics who began to work on assessments. Goodness of fit of item-response models is examined in terms of restricted likelihood-ratio tests and generalized residuals. Minimum discriminant information…
Descriptors: Statistics, Goodness of Fit, Item Response Theory, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems
Peer reviewed Peer reviewed
Direct linkDirect link
Castellano, Katherine E.; McCaffrey, Daniel F. – Journal of Educational Measurement, 2020
The residual gain score has been of historical interest, and its percentile rank has been of interest more recently given its close correspondence to the popular Student Growth Percentile. However, these estimators suffer from low accuracy and systematic bias (bias conditional on prior latent achievement). This article explores three…
Descriptors: Accuracy, Student Evaluation, Measurement Techniques, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Guo, Hongwen; Robin, Frederic; Dorans, Neil – Journal of Educational Measurement, 2017
The early detection of item drift is an important issue for frequently administered testing programs because items are reused over time. Unfortunately, operational data tend to be very sparse and do not lend themselves to frequent monitoring analyses, particularly for on-demand testing. Building on existing residual analyses, the authors propose…
Descriptors: Testing, Test Items, Identification, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
Belov, Dmitry I. – Journal of Educational Measurement, 2015
The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large-scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong-to-right ACs) used to detect examinees…
Descriptors: Statistical Analysis, Robustness (Statistics), Identification, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Moses, Tim; Yoo, Hanwook – Journal of Educational Measurement, 2015
This inquiry is an investigation of item response theory (IRT) proficiency estimators' accuracy under multistage testing (MST). We chose a two-stage MST design that includes four modules (one at Stage 1, three at Stage 2) and three difficulty paths (low, middle, high). We assembled various two-stage MST panels (i.e., forms) by manipulating two…
Descriptors: Comparative Analysis, Item Response Theory, Computation, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Suh, Youngsuk – Journal of Educational Measurement, 2016
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Descriptors: Effect Size, Goodness of Fit, Statistical Analysis, Statistical Significance
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015
With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…
Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014
With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)
Peer reviewed Peer reviewed
Direct linkDirect link
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Chen, Haiwen – Journal of Educational Measurement, 2012
In this article, linear item response theory (IRT) observed-score equating is compared under a generalized kernel equating framework with Levine observed-score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when…
Descriptors: Tests, Item Response Theory, Equated Scores, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Ranger, Jochen; Kuhn, Jorg-Tobias – Journal of Educational Measurement, 2012
The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate…
Descriptors: Goodness of Fit, Item Response Theory, Models, Matrices
Peer reviewed Peer reviewed
Direct linkDirect link
de la Torre, Jimmy; Hong, Yuan; Deng, Weiling – Journal of Educational Measurement, 2010
To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…
Descriptors: Classification, Computation, Models, Simulation
Previous Page | Next Page ยป
Pages: 1  |  2