ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	2

Descriptor

Evaluators	4
Interrater Reliability	4
Probability	4
Accuracy	1
Adults	1
Attitudes	1
Classification	1
Clinical Diagnosis	1
Construct Validity	1
Correlation	1
Cutting Scores	1
English	1
English (Second Language)	1
Error of Measurement	1
Estimation (Mathematics)	1
Evaluation Methods	1
Experience	1
Foreign Countries	1
Generalizability Theory	1
Higher Education	1
Item Response Theory	1
Language Proficiency	1
Language Research	1
Language Tests	1
Mathematical Formulas	1
More ▼

Source

Educational and Psychological…	2
Language Assessment Quarterly	1

Author

Conger, Anthony J.	1
Fehrmann, Melinda L.	1
Grove, Will	1
Isaacs, Talia	1
Thomson, Ron I.	1
Uebersax, John	1

Publication Type

Journal Articles	3
Reports - Research	3
Collected Works - Serials	1
Reports - Evaluative	1

Education Level

Adult Education

Audience

Practitioners	1
Researchers	1

Location

Canada

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis

Rater Experience, Rating Scale Length, and Judgments of L2 Pronunciation: Revisiting Research Conventions

Peer reviewed

Direct link

Isaacs, Talia; Thomson, Ron I. – Language Assessment Quarterly, 2013

This mixed-methods study examines the effects of rating scale length and rater experience on listeners' judgments of second-language (L2) speech. Twenty experienced and 20 novice raters, who were randomly assigned to 5-point or 9-point rating scale conditions, judged speech samples of 38 newcomers to Canada on numerical rating scales for…

Descriptors: Foreign Countries, Adults, Second Language Learning, English (Second Language)

Latent Structure Agreement Analysis. A RAND Note.

Download full text

Uebersax, John; Grove, Will – 1989

Methods of probability modeling to analyze rater agreement are described, emphasizing their basic similarities and viewing them as variants of a common methodology. Statistical techniques for analyzing agreement data are described to address questions such as how many opinions are required to make a medical diagnosis with necessary accuracy. Kappa…

Descriptors: Clinical Diagnosis, Correlation, Estimation (Mathematics), Evaluation Methods

The Angoff Cutoff Score Method: The Impact of Frame-of-Reference Rater Training.

Peer reviewed

Fehrmann, Melinda L.; And Others – Educational and Psychological Measurement, 1991

Two frame-of-reference rater training approaches were compared for effects on reliability and accuracy of cutoff scores generated by 21 raters using Angoff methods on tests taken by 155 undergraduates. Both approaches result in higher interrater reliability and more accuracy than does a non-frame-of-reference method. (SLD)

Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Higher Education