ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	4

Descriptor

Interrater Reliability	4
Probability	4
Accuracy	3
Error of Measurement	2
Models	2
Classification	1
Comparative Analysis	1
Computation	1
Cues	1
Demography	1
Evaluation Methods	1
Evaluators	1
Family Characteristics	1
Interviews	1
Item Response Theory	1
Mathematical Formulas	1
Measurement	1
National Surveys	1
Observation	1
Prediction	1
Recall (Psychology)	1
Responses	1
Rural Areas	1
Sampling	1
Scores	1
More ▼

Source

ETS Research Report Series	1
Educational and Psychological…	1
Journal of Learning Analytics	1
Sociological Methods &…	1

Author

Bosch, Nigel	1
Conger, Anthony J.	1
Donoghue, John R.	1
Hess, Melinda R.	1
Li, Dan	1
McClellan, Catherine A.	1
Paquette, Luc	1
West, Brady T.	1

Publication Type

Journal Articles	4
Reports - Research	4

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 4 results Save | Export

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

Sources of Variance in the Accuracy of Interviewer Observations

Peer reviewed

Direct link

West, Brady T.; Li, Dan – Sociological Methods & Research, 2019

In face-to-face surveys, interviewer observations are a cost-effective source of paradata for nonresponse adjustment of survey estimates and responsive survey designs. Unfortunately, recent studies have suggested that the accuracy of these observations can vary substantially among interviewers, even after controlling for household-, area-, and…

Descriptors: Observation, Interviews, Error of Measurement, Accuracy

Kappa and Rater Accuracy: Paradigms and Parameters

Peer reviewed

Direct link

Conger, Anthony J. – Educational and Psychological Measurement, 2017

Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…

Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis