Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 2 |
Descriptor
| Evaluators | 4 |
| Interrater Reliability | 4 |
| Probability | 4 |
| Accuracy | 1 |
| Adults | 1 |
| Attitudes | 1 |
| Classification | 1 |
| Clinical Diagnosis | 1 |
| Construct Validity | 1 |
| Correlation | 1 |
| Cutting Scores | 1 |
| More ▼ | |
Author
| Conger, Anthony J. | 1 |
| Fehrmann, Melinda L. | 1 |
| Grove, Will | 1 |
| Isaacs, Talia | 1 |
| Thomson, Ron I. | 1 |
| Uebersax, John | 1 |
Publication Type
| Journal Articles | 3 |
| Reports - Research | 3 |
| Collected Works - Serials | 1 |
| Reports - Evaluative | 1 |
Education Level
| Adult Education | 1 |
Audience
| Practitioners | 1 |
| Researchers | 1 |
Location
| Canada | 1 |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Isaacs, Talia; Thomson, Ron I. – Language Assessment Quarterly, 2013
This mixed-methods study examines the effects of rating scale length and rater experience on listeners' judgments of second-language (L2) speech. Twenty experienced and 20 novice raters, who were randomly assigned to 5-point or 9-point rating scale conditions, judged speech samples of 38 newcomers to Canada on numerical rating scales for…
Descriptors: Foreign Countries, Adults, Second Language Learning, English (Second Language)
Uebersax, John; Grove, Will – 1989
Methods of probability modeling to analyze rater agreement are described, emphasizing their basic similarities and viewing them as variants of a common methodology. Statistical techniques for analyzing agreement data are described to address questions such as how many opinions are required to make a medical diagnosis with necessary accuracy. Kappa…
Descriptors: Clinical Diagnosis, Correlation, Estimation (Mathematics), Evaluation Methods
Peer reviewedFehrmann, Melinda L.; And Others – Educational and Psychological Measurement, 1991
Two frame-of-reference rater training approaches were compared for effects on reliability and accuracy of cutoff scores generated by 21 raters using Angoff methods on tests taken by 155 undergraduates. Both approaches result in higher interrater reliability and more accuracy than does a non-frame-of-reference method. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Higher Education

Direct link
