NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 6 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Alt, Mary; Meyers, Christina; Figueroa, Cecilia – Journal of Speech, Language, and Hearing Research, 2013
Purpose: The purpose of this study was to determine whether children exposed to 2 languages would benefit from the phonotactic probability cues of a single language in the same way as monolingual peers and to determine whether crosslinguistic influence would be present in a fast-mapping task. Method: Two groups of typically developing children…
Descriptors: Regression (Statistics), Spanish, Cues, Task Analysis
Peer reviewed Peer reviewed
Blackman, Nicole J-M.; Koval, John J. – Applied Psychological Measurement, 1993
Four indexes of agreement between ratings of a person that correct for chance and are interpretable as intraclass correlation coefficients for different analysis of variance models are investigated. Relationships among the estimators are established for finite samples, and the equivalence of these estimators in large samples is demonstrated. (SLD)
Descriptors: Analysis of Variance, Equations (Mathematics), Estimation (Mathematics), Interrater Reliability
Peer reviewed Peer reviewed
Fehrmann, Melinda L.; And Others – Educational and Psychological Measurement, 1991
Two frame-of-reference rater training approaches were compared for effects on reliability and accuracy of cutoff scores generated by 21 raters using Angoff methods on tests taken by 155 undergraduates. Both approaches result in higher interrater reliability and more accuracy than does a non-frame-of-reference method. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Higher Education
O'Neill, Thomas R.; Lunz, Mary E. – 1996
To generalize test results beyond the particular test administration, an examinee's ability estimate must be independent of the particular items attempted, and the item difficulty calibrations must be independent of the particular sample of people attempting the items. This stability is a key concept of the Rasch model, a latent trait model of…
Descriptors: Ability, Benchmarking, Comparative Analysis, Difficulty Level
Rock, D. A.; And Others – 1980
An experiment was designed that varied cutting score procedures, instructions, and types of judges in order to address the following questions concerning the Real Estate Licensing Examination: (1) Will the cutting score levels produced by groups of judges from differing backgrounds (academicians vs. practitioners vs. lawyers) using the same method…
Descriptors: Competence, Content Analysis, Criterion Referenced Tests, Cutting Scores