Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 13 |
Descriptor
| Statistical Analysis | 16 |
| Simulation | 14 |
| Models | 8 |
| Item Response Theory | 7 |
| Test Items | 7 |
| Sample Size | 5 |
| Comparative Analysis | 4 |
| Regression (Statistics) | 4 |
| Scores | 4 |
| Achievement Tests | 3 |
| Computation | 3 |
| More ▼ | |
Source
| Journal of Educational… | 16 |
Author
| Choi, Seung W. | 2 |
| Kim, Dong-In | 2 |
| Sinharay, Sandip | 2 |
| Suh, Youngsuk | 2 |
| Wan, Ping | 2 |
| de la Torre, Jimmy | 2 |
| Armstrong, Ronald D. | 1 |
| Beuchert, A. Kent | 1 |
| Cho, Sun-Joo | 1 |
| De Boeck, Paul | 1 |
| Debeer, Dries | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 16 |
| Reports - Research | 11 |
| Reports - Evaluative | 5 |
Education Level
| Secondary Education | 2 |
| Grade 10 | 1 |
| Grade 9 | 1 |
| High Schools | 1 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| Indiana Statewide Testing for… | 2 |
| Program for International… | 1 |
What Works Clearinghouse Rating
Suh, Youngsuk – Journal of Educational Measurement, 2016
This study adapted an effect size measure used for studying differential item functioning (DIF) in unidimensional tests and extended the measure to multidimensional tests. Two effect size measures were considered in a multidimensional item response theory model: signed weighted P-difference and unsigned weighted P-difference. The performance of…
Descriptors: Effect Size, Goodness of Fit, Statistical Analysis, Statistical Significance
Li, Zhushan – Journal of Educational Measurement, 2014
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
Descriptors: Test Bias, Sample Size, Statistical Analysis, Regression (Statistics)
Debeer, Dries; Janssen, Rianne; De Boeck, Paul – Journal of Educational Measurement, 2017
When dealing with missing responses, two types of omissions can be discerned: items can be skipped or not reached by the test taker. When the occurrence of these omissions is related to the proficiency process the missingness is nonignorable. The purpose of this article is to present a tree-based IRT framework for modeling responses and omissions…
Descriptors: Item Response Theory, Test Items, Responses, Testing Problems
Sinharay, Sandip; Wan, Ping; Choi, Seung W.; Kim, Dong-In – Journal of Educational Measurement, 2015
With an increase in the number of online tests, the number of interruptions during testing due to unexpected technical issues seems to be on the rise. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. Researchers such as…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Statistical Analysis
Sinharay, Sandip; Wan, Ping; Whitaker, Mike; Kim, Dong-In; Zhang, Litong; Choi, Seung W. – Journal of Educational Measurement, 2014
With an increase in the number of online tests, interruptions during testing due to unexpected technical issues seem unavoidable. For example, interruptions occurred during several recent state tests. When interruptions occur, it is important to determine the extent of their impact on the examinees' scores. There is a lack of research on this…
Descriptors: Computer Assisted Testing, Testing Problems, Scores, Regression (Statistics)
Tendeiro, Jorge N.; Meijer, Rob R. – Journal of Educational Measurement, 2014
In recent guidelines for fair educational testing it is advised to check the validity of individual test scores through the use of person-fit statistics. For practitioners it is unclear on the basis of the existing literature which statistic to use. An overview of relatively simple existing nonparametric approaches to identify atypical response…
Descriptors: Educational Assessment, Test Validity, Scores, Statistical Analysis
de la Torre, Jimmy; Lee, Young-Sun – Journal of Educational Measurement, 2013
This article used the Wald test to evaluate the item-level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G-DINA model. Results show that when the sample size is small and a…
Descriptors: Statistical Analysis, Test Items, Goodness of Fit, Error of Measurement
Ranger, Jochen; Kuhn, Jorg-Tobias – Journal of Educational Measurement, 2012
The information matrix can equivalently be determined via the expectation of the Hessian matrix or the expectation of the outer product of the score vector. The identity of these two matrices, however, is only valid in case of a correctly specified model. Therefore, differences between the two versions of the observed information matrix indicate…
Descriptors: Goodness of Fit, Item Response Theory, Models, Matrices
Shang, Yi – Journal of Educational Measurement, 2012
Growth models are used extensively in the context of educational accountability to evaluate student-, class-, and school-level growth. However, when error-prone test scores are used as independent variables or right-hand-side controls, the estimation of such growth models can be substantially biased. This article introduces a…
Descriptors: Error of Measurement, Statistical Analysis, Regression (Statistics), Simulation
Suh, Youngsuk; Cho, Sun-Joo; Wollack, James A. – Journal of Educational Measurement, 2012
In the presence of test speededness, the parameter estimates of item response theory models can be poorly estimated due to conditional dependencies among items, particularly for end-of-test items (i.e., speeded items). This article conducted a systematic comparison of five-item calibration procedures--a two-parameter logistic (2PL) model, a…
Descriptors: Response Style (Tests), Timed Tests, Test Items, Item Response Theory
de la Torre, Jimmy; Hong, Yuan; Deng, Weiling – Journal of Educational Measurement, 2010
To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…
Descriptors: Classification, Computation, Models, Simulation
Moses, Tim; Holland, Paul W. – Journal of Educational Measurement, 2009
In this study, we compared 12 statistical strategies proposed for selecting loglinear models for smoothing univariate test score distributions and for enhancing the stability of equipercentile equating functions. The major focus was on evaluating the effects of the selection strategies on equating function accuracy. Selection strategies' influence…
Descriptors: Equated Scores, Selection, Statistical Analysis, Models
Armstrong, Ronald D.; Shi, Min – Journal of Educational Measurement, 2009
This article demonstrates the use of a new class of model-free cumulative sum (CUSUM) statistics to detect person fit given the responses to a linear test. The fundamental statistic being accumulated is the likelihood ratio of two probabilities. The detection performance of this CUSUM scheme is compared to other model-free person-fit statistics…
Descriptors: Probability, Simulation, Models, Psychometrics
Peer reviewedReise, Steve P.; Yu, Jiayuan – Journal of Educational Measurement, 1990
Parameter recovery in the graded-response model was investigated using the MULTILOG computer program under default conditions. Results from 36 simulated data sets suggest that at least 500 examinees are needed to achieve adequate calibration under the graded model. Sample size had little influence on the true ability parameter's recovery. (SLD)
Descriptors: Computer Assisted Testing, Computer Simulation, Computer Software, Estimation (Mathematics)
Peer reviewedBeuchert, A. Kent; Mendoza, Jorge L. – Journal of Educational Measurement, 1979
Ten item discrimination indices, across a variety of item analysis situations, were compared, based on the validities of tests constructed by using each of the indices to select 40 items from a 100-item pool. Item score data were generated by a computer program and included a simulation of guessing. (Author/CTM)
Descriptors: Item Analysis, Simulation, Statistical Analysis, Test Construction
Previous Page | Next Page ยป
Pages: 1 | 2
Direct link
