ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	3
Since 2017 (last 10 years)	5

Descriptor

Interrater Reliability	5
Testing	5
Scoring	4
Scores	3
Student Evaluation	3
Evaluation Methods	2
Psychometrics	2
Scoring Rubrics	2
Test Construction	2
Test Items	2
Test Reliability	2
Test Validity	2
Training	2
Action Research	1
Chemistry	1
Computation	1
Cultural Differences	1
Cultural Relevance	1
Culture Fair Tests	1
Educational Assessment	1
Essays	1
Evidence	1
Examiners	1
Foreign Countries	1
Graduate Students	1
More ▼

Source

Chemistry Education Research…	1
ETS Research Report Series	1
International Journal of…	1
Journal of Educational…	1
Online Submission	1

Author

Bunch, Michael B.	1
Donoghue, John R.	1
Hess, Melinda R.	1
Komperda, Regis	1
Lazenby, Katherine	1
Marcroft, Tina A.	1
McClellan, Catherine A.	1
Palermo, Corey	1
Ridge, Kirk	1
Saenz, David Arron	1
Tenney, Kristin	1
More ▼

Publication Type

Journal Articles	4
Reports - Research	4
Guides - General	1

Education Level

Higher Education	1
Postsecondary Education	1

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 5 results Save | Export

Inter-Rater Reliability in Comprehensive Examination Scoring: The Case for Consistent and Collaborative Rater Training and Calibration

Download full text

Saenz, David Arron – Online Submission, 2023

There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…

Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics

Investigating Constructed-Response Scoring over Time: The Effects of Study Design on Trend Rescore Statistics. Research Report. ETS RR-22-15

Peer reviewed
PDF on ERIC

Download full text

Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022

When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…

Descriptors: Item Response Theory, Test Construction, Scoring, Testing

Practices in Instrument Use and Development in "Chemistry Education Research and Practice" 2010-2021

Peer reviewed

Direct link

Lazenby, Katherine; Tenney, Kristin; Marcroft, Tina A.; Komperda, Regis – Chemistry Education Research and Practice, 2023

Assessment instruments that generate quantitative data on attributes (cognitive, affective, behavioral, "etc.") of participants are commonly used in the chemistry education community to draw conclusions in research studies or inform practice. Recently, articles and editorials have stressed the importance of providing evidence for the…

Descriptors: Chemistry, Periodicals, Journal Articles, Science Education

Scoring Stability in a Large-Scale Assessment Program: A Longitudinal Analysis of Leniency/Severity Effects

Peer reviewed

Direct link

Palermo, Corey; Bunch, Michael B.; Ridge, Kirk – Journal of Educational Measurement, 2019

Although much attention has been given to rater effects in rater-mediated assessment contexts, little research has examined the overall stability of leniency and severity effects over time. This study examined longitudinal scoring data collected during three consecutive administrations of a large-scale, multi-state summative assessment program.…

Descriptors: Scoring, Interrater Reliability, Measurement, Summative Evaluation

ITC Guidelines for the Large-Scale Assessment of Linguistically and Culturally Diverse Populations

Peer reviewed

Direct link

International Journal of Testing, 2019

These guidelines describe considerations relevant to the assessment of test takers in or across countries or regions that are linguistically or culturally diverse. The guidelines were developed by a committee of experts to help inform test developers, psychometricians, test users, and test administrators about fairness issues in support of the…

Descriptors: Test Bias, Student Diversity, Cultural Differences, Language Usage