Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 11 |
| Since 2017 (last 10 years) | 27 |
| Since 2007 (last 20 years) | 76 |
Descriptor
| Simulation | 107 |
| Test Length | 107 |
| Item Response Theory | 65 |
| Test Items | 51 |
| Sample Size | 44 |
| Computer Assisted Testing | 28 |
| Comparative Analysis | 27 |
| Adaptive Testing | 23 |
| Computation | 19 |
| Goodness of Fit | 19 |
| Models | 19 |
| More ▼ | |
Source
Author
| Cheng, Ying | 4 |
| Hambleton, Ronald K. | 4 |
| Wang, Wen-Chung | 4 |
| De Champlain, Andre | 3 |
| Drasgow, Fritz | 3 |
| Schumacker, Randall E. | 3 |
| Tay, Louis | 3 |
| Wells, Craig S. | 3 |
| Chun Wang | 2 |
| Cliff, Norman | 2 |
| Cui, Ying | 2 |
| More ▼ | |
Publication Type
| Journal Articles | 70 |
| Reports - Research | 65 |
| Reports - Evaluative | 25 |
| Speeches/Meeting Papers | 15 |
| Dissertations/Theses -… | 14 |
| Reports - Descriptive | 2 |
| Information Analyses | 1 |
| Numerical/Quantitative Data | 1 |
| Reports - General | 1 |
Education Level
Audience
Location
| Netherlands | 1 |
| Taiwan | 1 |
| Turkey | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Foley, Brett Patrick – ProQuest LLC, 2010
The 3PL model is a flexible and widely used tool in assessment. However, it suffers from limitations due to its need for large sample sizes. This study introduces and evaluates the efficacy of a new sample size augmentation technique called Duplicate, Erase, and Replace (DupER) Augmentation through a simulation study. Data are augmented using…
Descriptors: Test Length, Sample Size, Simulation, Item Response Theory
Deng, Nina – ProQuest LLC, 2011
Three decision consistency and accuracy (DC/DA) methods, the Livingston and Lewis (LL) method, LEE method, and the Hambleton and Han (HH) method, were evaluated. The purposes of the study were: (1) to evaluate the accuracy and robustness of these methods, especially when their assumptions were not well satisfied, (2) to investigate the "true"…
Descriptors: Item Response Theory, Test Theory, Computation, Classification
Wells, Craig S.; Cohen, Allan S.; Patton, Jeffrey – International Journal of Testing, 2009
A primary concern with testing differential item functioning (DIF) using a traditional point-null hypothesis is that a statistically significant result does not imply that the magnitude of DIF is of practical interest. Similarly, for a given sample size, a non-significant result does not allow the researcher to conclude the item is free of DIF. To…
Descriptors: Test Bias, Test Items, Statistical Analysis, Hypothesis Testing
Guo, Jing; Tay, Louis; Drasgow, Fritz – International Journal of Testing, 2009
Test compromise is a concern in cognitive ability testing because such tests are widely used in employee selection and administered on a continuous basis. In this study, the resistance of cognitive tests, deployed in different test systems, to small-scale cheating conspiracies, was evaluated regarding the accuracy of ability estimation.…
Descriptors: Cheating, Cognitive Tests, Adaptive Testing, Computer Assisted Testing
Kim, Jihye – ProQuest LLC, 2010
In DIF studies, a Type I error refers to the mistake of identifying non-DIF items as DIF items, and a Type I error rate refers to the proportion of Type I errors in a simulation study. The possibility of making a Type I error in DIF studies is always present and high possibility of making such an error can weaken the validity of the assessment.…
Descriptors: Test Bias, Test Length, Simulation, Testing
Furlow, Carolyn F.; Ross, Terris Raiford; Gagne, Phill – Applied Psychological Measurement, 2009
Douglas, Roussos, and Stout introduced the concept of differential bundle functioning (DBF) for identifying the underlying causes of differential item functioning (DIF). In this study, reference group was simulated to have higher mean ability than the focal group on a nuisance dimension, resulting in DIF for each of the multidimensional items…
Descriptors: Test Bias, Test Items, Reference Groups, Simulation
Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey – Educational and Psychological Measurement, 2009
The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…
Descriptors: Nonparametric Statistics, Item Response Theory, Test Items, Simulation
Finkelman, Matthew David – Applied Psychological Measurement, 2010
In sequential mastery testing (SMT), assessment via computer is used to classify examinees into one of two mutually exclusive categories. Unlike paper-and-pencil tests, SMT has the capability to use variable-length stopping rules. One approach to shortening variable-length tests is stochastic curtailment, which halts examination if the probability…
Descriptors: Mastery Tests, Computer Assisted Testing, Adaptive Testing, Test Length
de la Torre, Jimmy; Song, Hao – Applied Psychological Measurement, 2009
Assessments consisting of different domains (e.g., content areas, objectives) are typically multidimensional in nature but are commonly assumed to be unidimensional for estimation purposes. The different domains of these assessments are further treated as multi-unidimensional tests for the purpose of obtaining diagnostic information. However, when…
Descriptors: Ability, Tests, Item Response Theory, Data Analysis
Finkelman, Matthew – Journal of Educational and Behavioral Statistics, 2008
Sequential mastery testing (SMT) has been researched as an efficient alternative to paper-and-pencil testing for pass/fail examinations. One popular method for determining when to cease examination in SMT is the truncated sequential probability ratio test (TSPRT). This article introduces the application of stochastic curtailment in SMT to shorten…
Descriptors: Mastery Tests, Sequential Approach, Computer Assisted Testing, Adaptive Testing
Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Descriptors: Test Length, Test Content, Simulation, Computation
Wei, Youhua – ProQuest LLC, 2008
Scale linking is the process of developing the connection between scales of two or more sets of parameter estimates obtained from separate test calibrations. It is the prerequisite for many applications of IRT, such as test equating and differential item functioning analysis. Unidimensional scale linking methods have been studied and applied…
Descriptors: Test Length, Test Items, Sample Size, Simulation
Woods, Carol M. – Applied Psychological Measurement, 2008
In Ramsay-curve item response theory (RC-IRT), the latent variable distribution is estimated simultaneously with the item parameters of a unidimensional item response model using marginal maximum likelihood estimation. This study evaluates RC-IRT for the three-parameter logistic (3PL) model with comparisons to the normal model and to the empirical…
Descriptors: Test Length, Computation, Item Response Theory, Maximum Likelihood Statistics
Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics
Cui, Ying; Leighton, Jacqueline P. – Journal of Educational Measurement, 2009
In this article, we introduce a person-fit statistic called the hierarchy consistency index (HCI) to help detect misfitting item response vectors for tests developed and analyzed based on a cognitive model. The HCI ranges from -1.0 to 1.0, with values close to -1.0 indicating that students respond unexpectedly or differently from the responses…
Descriptors: Test Length, Simulation, Correlation, Research Methodology

Direct link
Peer reviewed
