Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 12 |
| Since 2017 (last 10 years) | 27 |
| Since 2007 (last 20 years) | 50 |
Descriptor
| Error of Measurement | 78 |
| Test Length | 78 |
| Test Items | 41 |
| Item Response Theory | 36 |
| Sample Size | 30 |
| Test Reliability | 20 |
| Models | 18 |
| Comparative Analysis | 17 |
| Simulation | 17 |
| Scores | 16 |
| Monte Carlo Methods | 15 |
| More ▼ | |
Source
Author
| Sijtsma, Klaas | 3 |
| Wang, Wen-Chung | 3 |
| DeMars, Christine E. | 2 |
| Emons, Wilco H. M. | 2 |
| Finch, Holmes | 2 |
| Gu, Lixiong | 2 |
| Kilic, Abdullah Faruk | 2 |
| Lee, Won-Chan | 2 |
| Lee, Yi-Hsuan | 2 |
| Livingston, Samuel A. | 2 |
| Stark, Stephen | 2 |
| More ▼ | |
Publication Type
| Journal Articles | 58 |
| Reports - Research | 53 |
| Reports - Evaluative | 16 |
| Dissertations/Theses -… | 4 |
| Speeches/Meeting Papers | 4 |
| Reports - Descriptive | 2 |
Education Level
| Grade 3 | 2 |
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Secondary Education | 2 |
| Early Childhood Education | 1 |
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| High Schools | 1 |
| Primary Education | 1 |
Audience
| Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedKristof, Walter – Psychometrika, 1971
Descriptors: Cognitive Measurement, Error of Measurement, Mathematical Models, Psychological Testing
Peer reviewedVan Der Linden, Wim J. – Educational and Psychological Measurement, 1983
This paper focuses on mixtures of two binomials with one known success parameter. It is shown how moment estimators can be obtained for the remaining unknown parameters of such mixtures, and results are presented from a Monte Carlo study carried out to explore the statistical properties of these estimators. (PN)
Descriptors: Educational Testing, Error of Measurement, Estimation (Mathematics), Guessing (Tests)
de la Torre, Jimmy; Stark, Stephen; Chernyshenko, Oleksandr S. – Applied Psychological Measurement, 2006
The authors present a Markov Chain Monte Carlo (MCMC) parameter estimation procedure for the generalized graded unfolding model (GGUM) and compare it to the marginal maximum likelihood (MML) approach implemented in the GGUM2000 computer program, using simulated and real personality data. In the simulation study, test length, number of response…
Descriptors: Computation, Monte Carlo Methods, Markov Processes, Item Response Theory
PDF pending restorationGilmer, Jerry S.; Feldt, Leonard S. – 1982
The Feldt-Gilmer congeneric reliability coefficients make it possible to estimate the reliability of a test composed of parts of unequal, unknown length. The approximate standard errors of the Feldt-Gilmer coefficients are derived via a method using the multivariate Taylor's expansion. Monte Carlo simulation is employed to corroborate the…
Descriptors: Educational Testing, Error of Measurement, Mathematical Formulas, Mathematical Models
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Haladyna, Tom; Roid, Gale – 1981
Two approaches to criterion-referenced test construction are compared. Classical test theory is based on the practice of random sampling from a well-defined domain of test items; latent trait theory suggests that the difficulty of the items should be matched to the achievement level of the student. In addition to these two methods of test…
Descriptors: Criterion Referenced Tests, Error of Measurement, Latent Trait Theory, Test Construction
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Finch, Holmes – Applied Psychological Measurement, 2005
This study compares the ability of the multiple indicators, multiple causes (MIMIC) confirmatory factor analysis model to correctly identify cases of differential item functioning (DIF) with more established methods. Although the MIMIC model might have application in identifying DIF for multiple grouping variables, there has been little…
Descriptors: Identification, Factor Analysis, Test Bias, Models
Wang, Wen-Chung; Su, Ya-Hui – Applied Psychological Measurement, 2004
Eight independent variables (differential item functioning [DIF] detection method, purification procedure, item response model, mean latent trait difference between groups, test length, DIF pattern, magnitude of DIF, and percentage of DIF items) were manipulated, and two dependent variables (Type I error and power) were assessed through…
Descriptors: Test Length, Test Bias, Simulation, Item Response Theory
Wang, Wen-Chung; Chen, Cheng-Te – Educational and Psychological Measurement, 2005
This study investigates item parameter recovery, standard error estimates, and fit statistics yielded by the WINSTEPS program under the Rasch model and the rating scale model through Monte Carlo simulations. The independent variables were item response model, test length, and sample size. WINSTEPS yielded practically unbiased estimates for the…
Descriptors: Statistics, Test Length, Rating Scales, Item Response Theory
Wang, Wen-Chung; Chen, Hsueh-Chu – Educational and Psychological Measurement, 2004
As item response theory (IRT) becomes popular in educational and psychological testing, there is a need of reporting IRT-based effect size measures. In this study, we show how the standardized mean difference can be generalized into such a measure. A disattenuation procedure based on the IRT test reliability is proposed to correct the attenuation…
Descriptors: Test Reliability, Rating Scales, Sample Size, Error of Measurement
Mills, Craig N.; Simon, Robert – 1981
When criterion-referenced tests are used to assign examinees to states reflecting their performance level on a test, the better known methods for determining test length, which consider relationships among domain scores and errors of measurement, have their limitations. The purpose of this paper is to present a computer system named TESTLEN, which…
Descriptors: Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores, Error of Measurement
PDF pending restorationMisanchuk, Earl R. – 1978
Multiple matrix sampling of three subscales of the California Psychological Inventory was used to investigate the effects of four variables on error estimates of the mean (EEM) and variance (EEV). The four variables were examinee population size (600, 450, 300, 150, 100, and 75); number of subtests, (2, 3, 4, 5, 6, and 7), hence the number of…
Descriptors: Adults, Analysis of Variance, Error of Measurement, Item Sampling
Peer reviewedDe Ayala, R. J. – Educational and Psychological Measurement, 1992
Effects of dimensionality on ability estimation of an adaptive test were examined using generated data in Bayesian computerized adaptive testing (CAT) simulations. Generally, increasing interdimensional difficulty association produced a slight decrease in test length and an increase in accuracy of ability estimation as assessed by root mean square…
Descriptors: Adaptive Testing, Bayesian Statistics, Computer Assisted Testing, Computer Simulation
Wingersky, Marilyn S.; Lord, Frederic M. – 1983
The sampling errors of maximum likelihood estimates of item-response theory parameters are studied in the case where both people and item parameters are estimated simultaneously. A check on the validity of the standard error formulas is carried out. The effect of varying sample size, test length, and the shape of the ability distribution is…
Descriptors: Error of Measurement, Estimation (Mathematics), Item Banks, Latent Trait Theory

Direct link
