Publication Date
In 2025 | 0 |
Since 2024 | 0 |
Since 2021 (last 5 years) | 0 |
Since 2016 (last 10 years) | 2 |
Since 2006 (last 20 years) | 3 |
Descriptor
Test Format | 9 |
Test Reliability | 9 |
Test Validity | 4 |
Comparative Analysis | 3 |
Evaluation Methods | 3 |
Test Construction | 3 |
Test Items | 3 |
Computer Simulation | 2 |
Cutting Scores | 2 |
Error of Measurement | 2 |
Multiple Choice Tests | 2 |
More ▼ |
Source
Journal of Educational… | 9 |
Author
Askegaard, Lewis D. | 1 |
Berk, Ronald A. | 1 |
Chang, Hua-Hua | 1 |
Douglas, Jeff | 1 |
Dwyer, Andrew C. | 1 |
Frary, Robert B. | 1 |
Frisbie, David A. | 1 |
Haberman, Shelby | 1 |
Joiner, Lee M. | 1 |
Kim, Sooyeon | 1 |
Lin, Haiyan | 1 |
More ▼ |
Publication Type
Journal Articles | 9 |
Reports - Research | 7 |
Guides - Non-Classroom | 1 |
Information Analyses | 1 |
Reports - Descriptive | 1 |
Reports - Evaluative | 1 |
Education Level
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
Peabody Picture Vocabulary… | 1 |
What Works Clearinghouse Rating
Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016
Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach
Dwyer, Andrew C. – Journal of Educational Measurement, 2016
This study examines the effectiveness of three approaches for maintaining equivalent performance standards across test forms with small samples: (1) common-item equating, (2) resetting the standard, and (3) rescaling the standard. Rescaling the standard (i.e., applying common-item equating methodology to standard setting ratings to account for…
Descriptors: Cutting Scores, Equivalency Tests, Test Format, Academic Standards
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…
Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis

Simon, Alan J.; Joiner, Lee M. – Journal of Educational Measurement, 1976
The purpose of this study was to determine whether a Mexican version of the Peabody Picture Vocabulary Test could be improved by directly translating both forms of the American test, then using decision procedures to select the better item of each pair. The reliability of the simple translations suffered. (Author/BW)
Descriptors: Early Childhood Education, Spanish, Test Construction, Test Format

Berk, Ronald A. – Journal of Educational Measurement, 1980
A dozen different approaches that yield 13 reliability indices for criterion-referenced tests were identified and grouped into three categories: threshold loss function, squared-error loss function, and domain score estimation. Indices were evaluated within each category. (Author/RL)
Descriptors: Classification, Criterion Referenced Tests, Cutting Scores, Evaluation Methods

Norcini, John J. – Journal of Educational Measurement, 1987
Answer keys for physician and teacher licensing examinations were studied. The impact of variability on total errors of measurement was examined for answer keys constructed using the aggregate method. Results indicated that, in some cases, scorers contributed to a sizable reduction in measurement error. (Author/GDC)
Descriptors: Adults, Answer Keys, Error of Measurement, Evaluators

Askegaard, Lewis D.; Umila, Benwardo V. – Journal of Educational Measurement, 1982
Multiple matrix sampling of items and examinees was applied to an 18-item rank order instrument administered to a randomly assigned group and compared to the ordering and ranking of all items by control subjects. High correlations between ranks suggest the methodology may viably reduce respondent effort on long rank ordering tasks. (Author/CM)
Descriptors: Evaluation Methods, Item Sampling, Junior High Schools, Student Reaction

Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)

Frisbie, David A.; Sweeney, Daryl C. – Journal of Educational Measurement, 1982
A 100-item five-choice multiple choice (MC) biology final exam was converted to multiple choice true-false (MTF) form to yield two content-parallel test forms comprised of the two item types. Students found the MTF items easier and preferred MTF over MC; the MTF subtests were more reliable. (Author/GK)
Descriptors: Biology, College Science, Comparative Analysis, Difficulty Level