ERIC - Search Results

Publication Date

In 2025	7
Since 2024	13
Since 2021 (last 5 years)	48
Since 2016 (last 10 years)	99
Since 2006 (last 20 years)	191

Descriptor

Interrater Reliability	339
Scoring	339
Writing Evaluation	79
Evaluators	72
Evaluation Methods	67
Scores	60
Test Reliability	60
Foreign Countries	57
Student Evaluation	52
Test Construction	50
Language Tests	48
Test Validity	46
Correlation	45
English (Second Language)	44
Performance Based Assessment	42
Test Items	42
Comparative Analysis	41
Essay Tests	41
Essays	37
Higher Education	36
Computer Assisted Testing	35
Scoring Rubrics	34
Item Response Theory	33
Second Language Learning	33
Writing Tests	31
More ▼

Publication Type

Reports - Research	218
Journal Articles	216
Reports - Evaluative	72
Speeches/Meeting Papers	67
Reports - Descriptive	26
Tests/Questionnaires	26
Numerical/Quantitative Data	10
Dissertations/Theses -…	8
Opinion Papers	8
Information Analyses	5
Books	2
Guides - Non-Classroom	2
Collected Works - General	1
ERIC Digests in Full Text	1
ERIC Publications	1
Guides - General	1
Reports - General	1
More ▼

Education Level

Higher Education	59
Postsecondary Education	48
Elementary Education	26
Secondary Education	18
Elementary Secondary Education	15
High Schools	11
Grade 4	10
Middle Schools	10
Early Childhood Education	8
Intermediate Grades	8
Grade 8	6
Grade 6	5
Junior High Schools	5
Grade 5	4
Primary Education	4
Grade 10	3
Grade 3	3
Kindergarten	3
Preschool Education	3
Adult Education	2
Grade 1	2
Grade 11	2
Grade 2	2
Grade 7	2
Grade 12	1
More ▼

Audience

Researchers	15
Practitioners	7
Teachers	7
Administrators	1

Location

Turkey	9
China	8
Japan	5
Iran	4
Netherlands	4
Australia	3
California	3
Germany	3
India	3
North Carolina	3
South Korea	3
United Kingdom	3
United Kingdom (England)	3
Canada	2
Hong Kong	2
Nevada	2
New Mexico	2
Pennsylvania	2
Singapore	2
Taiwan	2
United States	2
Vermont	2
Arizona	1
Belgium	1
California (Los Angeles)	1
More ▼

Laws, Policies, & Programs

Individuals with Disabilities…	1
No Child Left Behind Act 2001	1
Race to the Top	1

What Works Clearinghouse Rating

Showing 1 to 15 of 339 results Save | Export

Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations

Peer reviewed

Direct link

Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…

Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Development of a Categorical Scoring Codebook for Entrepreneurial Mindset (EM) Concept Maps

Peer reviewed

Direct link

Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025

Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…

Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation

Developing an Automatic Pronunciation Scorer: Aligning Speech Evaluation Models and Applied Linguistics Constructs

Peer reviewed

Direct link

Danwei Cai; Ben Naismith; Maria Kostromitina; Zhongwei Teng; Kevin P. Yancey; Geoffrey T. LaFlair – Language Learning, 2025

Globalization and increases in the numbers of English language learners have led to a growing demand for English proficiency assessments of spoken language. In this paper, we describe the development of an automatic pronunciation scorer built on state-of-the-art deep neural network models. The model is trained on a bespoke human-rated dataset that…

Descriptors: Automation, Scoring, Pronunciation, Speech Tests

Detecting Rater Bias in Mixed-Format Assessments

Peer reviewed

Direct link

Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024

Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…

Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses

A Data-Driven Approach for the Identification of Features for Automated Feedback on Academic Essays

Peer reviewed

Direct link

Abbas, Mohsin; van Rosmalen, Peter; Kalz, Marco – IEEE Transactions on Learning Technologies, 2023

For predicting and improving the quality of essays, text analytic metrics (surface, syntactic, morphological, and semantic features) can be used to provide formative feedback to the students in higher education. In this study, the goal was to identify a sufficient number of features that exhibit a fair proxy of the scores given by the human raters…

Descriptors: Feedback (Response), Automation, Essays, Scoring

Monitoring Rater Quality in Observational Systems: Issues Due to Unreliable Estimates of Rater Quality

Peer reviewed

Direct link

Mark White; Matt Ronfeldt – Educational Assessment, 2024

Standardized observation systems seek to reliably measure a specific conceptualization of teaching quality, managing rater error through mechanisms such as certification, calibration, validation, and double-scoring. These mechanisms both support high quality scoring and generate the empirical evidence used to support the scoring inference (i.e.,…

Descriptors: Interrater Reliability, Quality Control, Teacher Effectiveness, Error Patterns

Grading Exams Using Large Language Models: A Comparison between Human and AI Grading of Exams in Higher Education Using ChatGPT

Peer reviewed

Direct link

Jonas Flodén – British Educational Research Journal, 2025

This study compares how the generative AI (GenAI) large language model (LLM) ChatGPT performs in grading university exams compared to human teachers. Aspects investigated include consistency, large discrepancies and length of answer. Implications for higher education, including the role of teachers and ethics, are also discussed. Three…

Descriptors: College Faculty, Artificial Intelligence, Comparative Testing, Scoring

Inter-Rater Reliability in Comprehensive Examination Scoring: The Case for Consistent and Collaborative Rater Training and Calibration

Download full text

Saenz, David Arron – Online Submission, 2023

There is a vast body of literature documenting the positive impacts that rater training and calibration sessions have on inter-rater reliability as research indicates several factors including frequency and timing play crucial roles towards ensuring inter-rater reliability. Additionally, increasing amounts research indicate possible links in…

Descriptors: Interrater Reliability, Scoring, Training, Scoring Rubrics

Examining Inter-Rater Reliability of Evaluators Judging Teacher Performance: Proposing an Alternative to Cohen's Kappa. CEME Technical Report. CEMETR-2022-02

Download full text

Lambert, Richard G.; Holcomb, T. Scott; Bottoms, Bryndle – Center for Educational Measurement and Evaluation, 2022

The validity of the Kappa coefficient of chance-corrected agreement has been questioned when the prevalence of specific rating scale categories is low and agreement between raters is high. The researchers proposed the Lambda Coefficient of Rater-Mediated Agreement as an alternative to Kappa to address these concerns. Lambda corrects for chance…

Descriptors: Interrater Reliability, Evaluators, Rating Scales, Teacher Evaluation

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Using Machine Learning to Score Multi-Dimensional Assessments of Chemistry and Physics

Peer reviewed

Direct link

Maestrales, Sarah; Zhai, Xiaoming; Touitou, Israel; Baker, Quinton; Schneider, Barbara; Krajcik, Joseph – Journal of Science Education and Technology, 2021

In response to the call for promoting three-dimensional science learning (NRC, 2012), researchers argue for developing assessment items that go beyond rote memorization tasks to ones that require deeper understanding and the use of reasoning that can improve science literacy. Such assessment items are usually performance-based constructed…

Descriptors: Artificial Intelligence, Scoring, Evaluation Methods, Chemistry

Analyzing Inter-Rater Variation: Exploring Consistency in Mathematics Teachers' Scoring of Exam Papers

Peer reviewed
PDF on ERIC

Download full text

Hosseinali Gholami – Mathematics Teaching Research Journal, 2025

Scoring mathematics exam papers accurately is vital for fostering students' engagement and interest in the subject. Incorrect scoring practices can erode motivation and lead to the development of false self-confidence. Therefore, the implementation of appropriate scoring methods is essential for the success of mathematics education. This study…

Descriptors: Interrater Reliability, Mathematics Teachers, Scoring, Mathematics Tests

Statistically Guided Grading Judgements: Contextualisation or Contamination?

Peer reviewed

Direct link

Louise Badham – Oxford Review of Education, 2025

Different sources of assessment evidence are reviewed during International Baccalaureate (IB) grade awarding to convert marks into grades and ensure fair results for students. Qualitative and quantitative evidence are analysed to determine grade boundaries, with statistical evidence weighed against examiner judgement and teachers' feedback on…

Descriptors: Advanced Placement Programs, Grading, Interrater Reliability, Evaluative Thinking

Best Practices for Constructed-Response Scoring. Research Report. ETS RR-22-17

Peer reviewed
PDF on ERIC

Download full text

McCaffrey, Daniel F.; Casabianca, Jodi M.; Ricker-Pedley, Kathryn L.; Lawless, René R.; Wendler, Cathy – ETS Research Report Series, 2022

This document describes a set of best practices for developing, implementing, and maintaining the critical process of scoring constructed-response tasks. These practices address both the use of human raters and automated scoring systems as part of the scoring process and cover the scoring of written, spoken, performance, or multimodal responses.…

Descriptors: Best Practices, Scoring, Test Format, Computer Assisted Testing

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 23

ETS Research Report Series	20
Applied Measurement in…	12
Educational Measurement:…	12
Educational and Psychological…	8
Language Testing	8
ProQuest LLC	8
Journal of Educational…	7
Assessing Writing	5
Online Submission	5
Advances in Health Sciences…	4
Assessment in Education:…	4
Journal of Technology,…	4
Assessment	3
Educational Assessment	3
Journal of Experimental…	3
Journal of Speech, Language,…	3
Practical Assessment,…	3
Reading Psychology	3
CBE - Life Sciences Education	2
College Teaching	2
Educational Testing Service	2
English Language Teaching	2
Eurasian Journal of…	2
European Journal of…	2
Grantee Submission	2
More ▼

Lunz, Mary E.	5
Johnson, Robert L.	4
Bejar, Isaac I.	3
Breland, Hunter M.	3
Casabianca, Jodi M.	3
Gearhart, Maryl	3
Jaeger, Richard M.	3
Michaels, Hillary	3
O'Neill, Thomas R.	3
Plake, Barbara S.	3
Shavelson, Richard J.	3
Stefanie A. Wind	3
Wendler, Cathy	3
Williamson, David M.	3
Wolfe, Edward W.	3
Xi, Xiaoming	3
Zhang, Mo	3
Anna-Maria Fall	2
Attali, Yigal	2
Ben-Simon, Anat	2
Bennett, Randy Elliot	2
Beula M. Magimairaj	2
Breyer, F. Jay	2
Brydges, Ryan	2
More ▼

Test of English as a Foreign…	16
National Assessment of…	9
Graduate Record Examinations	8
Advanced Placement…	3
International English…	3
Peabody Picture Vocabulary…	2
Raven Progressive Matrices	2
SAT (College Admission Test)	2
Woodcock Johnson Tests of…	2
ACT Assessment	1
Alabama High School…	1
Beery Developmental Test of…	1
Bender Gestalt Test	1
Child Behavior Checklist	1
Clinical Evaluation of…	1
Developmental Test of Visual…	1
Early Childhood Environment…	1
Expressive One Word Picture…	1
General Educational…	1
Graduate Management Admission…	1
Kaufman Assessment Battery…	1
Mean Length of Utterance	1
Medical College Admission Test	1
National Teacher Examinations	1
Neale Analysis of Reading…	1
More ▼