Publication Date
| In 2026 | 0 |
| Since 2025 | 15 |
| Since 2022 (last 5 years) | 63 |
| Since 2017 (last 10 years) | 162 |
| Since 2007 (last 20 years) | 321 |
Descriptor
Source
Author
| Hambleton, Ronald K. | 15 |
| Wang, Wen-Chung | 9 |
| Livingston, Samuel A. | 6 |
| Sijtsma, Klaas | 6 |
| Wainer, Howard | 6 |
| Weiss, David J. | 6 |
| Wilcox, Rand R. | 6 |
| Cheng, Ying | 5 |
| Gessaroli, Marc E. | 5 |
| Lee, Won-Chan | 5 |
| Lewis, Charles | 5 |
| More ▼ | |
Publication Type
Education Level
Location
| Turkey | 8 |
| Australia | 7 |
| Canada | 7 |
| China | 5 |
| Netherlands | 5 |
| Japan | 4 |
| Taiwan | 4 |
| United Kingdom | 4 |
| Germany | 3 |
| Michigan | 3 |
| Singapore | 3 |
| More ▼ | |
Laws, Policies, & Programs
| Americans with Disabilities… | 1 |
| Equal Access | 1 |
| Job Training Partnership Act… | 1 |
| Race to the Top | 1 |
| Rehabilitation Act 1973… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedYen, Wendy M.; Candell, Gregory L. – Applied Measurement in Education, 1991
Empirical reliabilities of scores based on item-pattern scoring, using 3-parameter item-response theory and number-correct scoring, were compared within each of 5 score metrics for at least 900 elementary school students for 5 content areas. Average increases in reliability were produced by item-pattern scoring. (SLD)
Descriptors: Elementary Education, Elementary School Students, Grade Equivalent Scores, Item Response Theory
Peer reviewedSindhu, R. S.; Sharma, Reeta – Science Education International, 1999
Finds that the time required to attempt all the test items of each question paper in a four-paper sample was inversely proportional to the percentage of students who attempted all the test items of that paper. Extrapolates results to give guidelines for determining the feasibility of newly-developed exam papers. (WRM)
Descriptors: Science Tests, Secondary Education, Test Construction, Test Length
Peer reviewedWard, L. Charles; Ryan, Joseph J. – Psychological Assessment, 1996
Validity and reliability were calculated from data in the standardization sample of the Wechsler Adult Intelligence Scale--Revised for 565 proposed short forms. Time saved in comparison with use of the long form was estimated. The most efficient combinations were generally those composed of subtests that were quick to administer. (SLD)
Descriptors: Cost Effectiveness, Intelligence Tests, Selection, Test Format
Peer reviewedAxelrod, Bradley N.; And Others – Psychological Assessment, 1996
The calculations of D. Schretlen, R. H. B. Benedict, and J. H. Bobholz for the reliabilities of a short form of the Wechsler Adult Intelligence Scale--Revised (WAIS-R) (1994) consistently overestimated the values. More accurate values are provided for the WAIS--R and a seven-subtest short form. (SLD)
Descriptors: Error Correction, Error of Measurement, Estimation (Mathematics), Intelligence Tests
Peer reviewedWhitmore, Marjorie L.; Schumacker, Randall E. – Educational and Psychological Measurement, 1999
Compared differential item functioning detection rates for logistic regression and analysis of variance for dichotomously scored items using simulated data and varying test length, sample size, discrimination rate, and underlying ability. Explains why the logistic regression method is recommended for most applications. (SLD)
Descriptors: Ability, Analysis of Variance, Comparative Analysis, Item Bias
Hendrawan, Irene; Glas, Cees A. W.; Meijer, Rob R. – Applied Psychological Measurement, 2005
The effect of person misfit to an item response theory model on a mastery/nonmastery decision was investigated. Furthermore, it was investigated whether the classification precision can be improved by identifying misfitting respondents using person-fit statistics. A simulation study was conducted to investigate the probability of a correct…
Descriptors: Probability, Statistics, Test Length, Simulation
De Champlain, Andre – 1996
The usefulness of a goodness-of-fit index proposed by R. P. McDonald (1989) was investigated with regard to assessing the dimensionality of item response matrices. The m subscript k index, which is based on an estimate of the noncentrality parameter of the noncentral chi-square distribution, possesses several advantages over traditional tests of…
Descriptors: Chi Square, Cutting Scores, Goodness of Fit, Item Response Theory
Yamamoto, Kentaro – 1995
The traditional indicator of test speededness, missing responses, clearly indicates a lack of time to respond (thereby indicating the speededness of the test), but it is inadequate for evaluating speededness in a multiple-choice test scored as number correct, and it underestimates test speededness. Conventional item response theory (IRT) parameter…
Descriptors: Ability, Estimation (Mathematics), Item Response Theory, Multiple Choice Tests
Samejima, Fumiko; Changas, Paul S. – 1981
The methods and approaches for estimating the operating characteristics of the discrete item responses without assuming any mathematical form have been developed and expanded. It has been made possible that, even if the test information function of a given test is not constant for the interval of ability of interest, it is used as the Old Test.…
Descriptors: Adaptive Testing, Latent Trait Theory, Mathematical Models, Methods
Peer reviewedRowley, Glenn – Journal of Educational Measurement, 1978
The reliabilities of various observational measures were determined, and the influence of both the number and the length of the observation periods on reliability was examined, both separately and jointly. A single simplifying assumption leads to a variant of the Spearman-Brown formula, which may have wider application. (Author/CTM)
Descriptors: Career Development, Classroom Observation Techniques, Observation, Reliability
Peer reviewedHambleton, Ronald K. – Educational and Psychological Measurement, 1987
This paper presents an algorithm for determining the number of items to measure each objective in a criterion-referenced test when testing time is fixed and when the objectives vary in their levels of importance, reliability, and validity. Results of four special applications of the algorithm are presented. (BS)
Descriptors: Algorithms, Behavioral Objectives, Criterion Referenced Tests, Test Construction
De Champlain, Andre F. – 1999
The purpose of this study was to examine empirical Type I error rates and rejection rates for three dimensionality assessment procedures with data sets simulated to reflect short tests and small samples. The TESTFACT G superscript 2 difference test suffered from an inflated Type I error rate with unidimensional data sets, while the approximate chi…
Descriptors: Admission (School), College Entrance Examinations, Item Response Theory, Law Schools
Peer reviewedSpineti, John P.; Hambleton, Ronald K. – Educational and Psychological Measurement, 1977
The effectiveness of various tailored testing strategies for use in objective based instructional programs was investigated. The three factors of a tailored testing strategy under study with various hypothetical distributions of abilities across two learning hierarchies were test length, mastery cutting score, and starting point. (Author/JKS)
Descriptors: Adaptive Testing, Computer Assisted Testing, Criterion Referenced Tests, Cutting Scores
Peer reviewedKim, Seock-Ho; And Others – Psychometrika, 1994
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item and ability parameters through two joint and two marginal Bayesian procedures. Marginal procedures yielded smaller root mean square differences for item and ability, but results for larger sample size and test length were similar.…
Descriptors: Ability, Bayesian Statistics, Computer Simulation, Estimation (Mathematics)
Peer reviewedSmith, Renee L.; And Others – Psychological Assessment, 1995
The clinical utility of using fewer than 12 trials of the Selective Reminding Test, a task to assess verbal memory, was studied with 100 cardiac patients and 100 brain injury patients. Results suggest that as few as 6 trials might be adequate, providing information consistent with that from 12 trials. (SLD)
Descriptors: Clinical Diagnosis, Diagnostic Tests, Head Injuries, Memory

Direct link
