Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 31 |
| Since 2017 (last 10 years) | 79 |
| Since 2007 (last 20 years) | 136 |
Descriptor
Source
Author
| Wainer, Howard | 6 |
| Hambleton, Ronald K. | 4 |
| Wang, Wen-Chung | 4 |
| Berk, Ronald A. | 3 |
| Burton, Richard F. | 3 |
| Cohen, Allan S. | 3 |
| Huggins-Manley, Anne Corinne | 3 |
| Lee, Won-Chan | 3 |
| Lee, Yi-Hsuan | 3 |
| Pommerich, Mary | 3 |
| Reckase, Mark D. | 3 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 9 |
| Administrators | 1 |
| Community | 1 |
| Practitioners | 1 |
Location
| Turkey | 2 |
| Alabama | 1 |
| Asia | 1 |
| Australia | 1 |
| Germany | 1 |
| Illinois (Chicago) | 1 |
| Indiana | 1 |
| Iran | 1 |
| Israel | 1 |
| Japan | 1 |
| Netherlands | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Job Training Partnership Act… | 1 |
| Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedBerk, Ronald A. – Educational and Psychological Measurement, 1978
Three formulae developed to correct item-total correlations for spuriousness were evaluated. Relationships among corrected, uncorrected, and item-remainder correlations were determined by computing sets of mean, minimum, and maximum deviation coefficients and Spearman rank correlations for nine test lengths. (Author/JKS)
Descriptors: Correlation, Intermediate Grades, Item Analysis, Test Construction
Chen, Shu-Ying; Ankenmann, Robert D.; Spray, Judith A. – 1999
This paper presents a derivation of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CAT). This relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. Implications for practice as well as future research are also…
Descriptors: Adaptive Testing, Computer Assisted Testing, Item Banks, Test Items
Peer reviewedConger, Anthony J. – Educational and Psychological Measurement, 1983
A paradoxical phenomenon of decreases in reliability as the number of elements averaged over increases is shown to be possible in multifacet reliability procedures (intraclass correlations or generalizability coefficients). Conditions governing this phenomenon are presented along with implications and cautions. (Author)
Descriptors: Generalizability Theory, Test Construction, Test Items, Test Length
Emons, Wilco H. M.; Sijtsma, Klaas; Meijer, Rob R. – Psychological Methods, 2007
Short tests containing at most 15 items are used in clinical and health psychology, medicine, and psychiatry for making decisions about patients. Because short tests have large measurement error, the authors ask whether they are reliable enough for classifying patients into a treatment and a nontreatment group. For a given certainty level,…
Descriptors: Psychiatry, Patients, Error of Measurement, Test Length
Peer reviewedSunathong, Surintorn; Schumacker, Randall E.; Beyerlein, Michael M. – Journal of Applied Measurement, 2000
Studied five factors that can affect the equating of scores from two tests onto a common score scale through the simulation and equating of 4,860 item data sets. Findings indicate three statistically significant two-way interactions for common item length and test length, item difficulty standard deviation and item distribution type, and item…
Descriptors: Difficulty Level, Equated Scores, Interaction, Item Response Theory
Wiberg, Marie – International Journal of Testing, 2006
A simulation study of a sequential computerized mastery test is carried out with items modeled with the 3 parameter logistic item response theory model. The examinees' responses are either identically distributed, not identically distributed, or not identically distributed together with estimation errors in the item characteristics. The…
Descriptors: Test Length, Computer Simulation, Mastery Tests, Item Response Theory
PDF pending restorationNeustel, Sandra – 2001
As a continuing part of its validity studies, the Association of American Medical Colleges commissioned a study of the speediness of the Medical College Admission Test (MCAT). If speed is a hidden part of the test, it is a threat to its construct validity. As a general rule, the criterion used to indicate lack of speediness is that 80% of the…
Descriptors: College Applicants, College Entrance Examinations, Higher Education, Medical Education
Ito, Kyoko; Sykes, Robert C. – 2000
This study investigated the practice of weighting a type of test item, such as constructed response, more than other types of items, such as selected response, to compute student scores for a mixed-item type of test. The study used data from statewide writing field tests in grades 3, 5, and 8 and considered two contexts, that in which a single…
Descriptors: Constructed Response, Elementary Education, Essay Tests, Test Construction
Peer reviewedBerk, Ronald A. – Journal of Experimental Education, 1980
A sampling methodology is proposed for determining lengths of tests designed to assess the comprehension of written discourse. It is based on Bormuth's transformational analysis, within a domain-referenced framework. Guidelines are provided for computing sample size and selecting sentences to which the transformational rules can be applied.…
Descriptors: Reading Comprehension, Reading Tests, Sampling, Test Construction
Peer reviewedDe Ayala, R. J. – Applied Psychological Measurement, 1994
Previous work on the effects of dimensionality on parameter estimation for dichotomous models is extended to the graded response model. Datasets are generated that differ in the number of latent factors as well as their interdimensional association, number of test items, and sample size. (SLD)
Descriptors: Estimation (Mathematics), Item Response Theory, Maximum Likelihood Statistics, Sample Size
Peer reviewedLivingston, Samuel A.; Lewis, Charles – Journal of Educational Measurement, 1995
A method is presented for estimating the accuracy and consistency of classifications based on test scores. The reliability of the score is used to estimate effective test length in terms of discrete items. The true-score distribution is estimated by fitting a four-parameter beta model. (SLD)
Descriptors: Classification, Estimation (Mathematics), Scores, Statistical Distributions
Peer reviewedQualls, Audrey L. – Applied Measurement in Education, 1995
Classically parallel, tau-equivalently parallel, and congenerically parallel models representing various degrees of part-test parallelism and their appropriateness for tests composed of multiple item formats are discussed. An appropriate reliability estimate for a test with multiple item formats is presented and illustrated. (SLD)
Descriptors: Achievement Tests, Estimation (Mathematics), Measurement Techniques, Test Format
Wingersky, Marilyn S. – 1989
In a variable-length adaptive test with a stopping rule that relied on the asymptotic standard error of measurement of the examinee's estimated true score, M. S. Stocking (1987) discovered that it was sufficient to know the examinee's true score and the number of items administered to predict with some accuracy whether an examinee's true score was…
Descriptors: Adaptive Testing, Bayesian Statistics, Error of Measurement, Estimation (Mathematics)
Peer reviewedOwen, Steven V.; Froman, Robin D. – Educational and Psychological Measurement, 1987
To test further for efficacy of three-option achievement items, parallel three- and five-option item tests were distributed randomly to college students. Results showed no differences in mean item difficulty, mean discrimination or total test score, but a substantial reduction in time spent on three-option items. (Author/BS)
Descriptors: Achievement Tests, Higher Education, Multiple Choice Tests, Test Format
Schulz, E. Matthew; Wang, Lin – 2001
In this study, items were drawn from a full-length test of 30 items in order to construct shorter tests for the purpose of making accurate pass/fail classifications with regard to a specific criterion point on the latent ability metric. A three-item parameter Item Response Theory (IRT) framework was used. The criterion point on the latent ability…
Descriptors: Ability, Classification, Item Response Theory, Pass Fail Grading

Direct link
