ERIC - Search Results

Publication Date

In 2025	1
Since 2024	3
Since 2021 (last 5 years)	13
Since 2016 (last 10 years)	25
Since 2006 (last 20 years)	290

Descriptor

Interrater Reliability	515
Evaluation Methods	133
Test Reliability	81
Foreign Countries	77
Scoring	72
Test Validity	72
Correlation	61
Rating Scales	60
Measures (Individuals)	59
Evaluators	56
Psychometrics	56
Comparative Analysis	53
Validity	53
Student Evaluation	52
Performance Based Assessment	48
Higher Education	44
Scores	41
Test Construction	39
Measurement Techniques	36
Statistical Analysis	36
Reliability	34
Research Methodology	34
Writing Evaluation	33
Educational Assessment	32
Generalizability Theory	32
More ▼

Publication Type

Reports - Evaluative	515
Journal Articles	397
Speeches/Meeting Papers	75
Information Analyses	12
Tests/Questionnaires	12
Opinion Papers	10
Numerical/Quantitative Data	8
Reports - Research	8
Reports - Descriptive	3
Collected Works - Serials	1
Dissertations/Theses -…	1
More ▼

Education Level

Higher Education	73
Elementary Secondary Education	27
Postsecondary Education	27
Elementary Education	24
Secondary Education	16
Adult Education	13
Early Childhood Education	10
High Schools	10
Grade 1	8
Preschool Education	8
Grade 5	7
Grade 4	6
Grade 6	6
Kindergarten	6
Grade 3	5
Grade 8	5
Middle Schools	5
Grade 7	4
Grade 2	3
Intermediate Grades	3
Junior High Schools	3
Primary Education	3
Grade 11	1
Grade 9	1
More ▼

Audience

Researchers	12
Practitioners	10
Teachers	6
Administrators	5

Location

United Kingdom	12
Australia	11
California	6
Taiwan	6
United Kingdom (England)	6
Canada	5
Florida	5
Netherlands	5
Sweden	5
Pennsylvania	4
United States	4
Georgia	3
Hong Kong	3
Indiana	3
Arizona	2
Japan	2
Michigan	2
New York	2
New Zealand	2
North Carolina	2
Ohio	2
Tennessee	2
Turkey	2
United Kingdom (Scotland)	2
United Kingdom (Wales)	2
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	3
Individuals with Disabilities…	2
Race to the Top	2
Americans with Disabilities…	1
Improving Americas Schools…	1
Rehabilitation Act 1973…	1

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1

Showing 1 to 15 of 515 results Save | Export

Chasing Rainbows? Ofsted's Quest for Inter-Inspector Reliability

Peer reviewed

Direct link

Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023

Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…

Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment

"Rater Training" Re-Imagined for Work-Based Assessment in Medical Education

Peer reviewed

Direct link

Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023

In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…

Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training

Statistical Inference for G-Indices of Agreement

Peer reviewed

Direct link

Bonett, Douglas G. – Journal of Educational and Behavioral Statistics, 2022

The limitations of Cohen's ? are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical…

Descriptors: Statistical Inference, Statistical Data, Interrater Reliability, Design

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

A Rubric for the Detection of Students in Crisis

Peer reviewed

Direct link

Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021

For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…

Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests

Constructing a Roadmap to Measure the Quality of Business Assessments Aimed at Curriculum Management

Peer reviewed

Direct link

Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023

Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…

Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education

Evidence-Based Evaluation of Student and Marker Performances in Assessment and Examination

Peer reviewed

Direct link

Ole J. Kemi – Advances in Physiology Education, 2025

Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…

Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards

Project Pulse: A Dance Research Experience

Peer reviewed

Direct link

Thorne, Casey Lee – Journal of Dance Education, 2022

The research outlined in this article offers a systematic training methodology for students and licensed Traditional Chinese Medicine (TCM) practitioners to learn the clinical art and science of pulsology through dance. One of the greatest hurdles in learning pulse palpation is a TCM practitioner's inability to feel the pulse with a degree of…

Descriptors: Dance Education, Metabolism, Medicine, Asian Culture

Evaluating Quadratic Weighted Kappa as the Standard Performance Metric for Automated Essay Scoring

Peer reviewed
PDF on ERIC

Download full text

Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023

Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…

Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy

Quantified Qualitative Analysis: Rubric Development and Inter-Rater Reliability as Iterative Design

Peer reviewed
PDF on ERIC

Download full text

McCarthy, Kathryn S.; Magliano, Joseph P.; Snyder, Jacob O.; Kenney, Elizabeth A.; Newton, Natalie N.; Perret, Cecile A.; Knezevic, Melanie; Allen, Laura K.; McNamara, Danielle S. – Grantee Submission, 2021

The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students' verbal protocols. The overarching project comprises multiple data sets, multiple…

Descriptors: Scoring Rubrics, Interrater Reliability, Design, Learning Processes

Evaluating Human Scoring Using Generalizability Theory

Peer reviewed

Direct link

Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020

Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…

Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries

Observational Feedback Literacy: Designing Post Observation Feedback for Learning

Peer reviewed

Direct link

Marion Heron; Helen Donaghue; Kieran Balloo – Teaching in Higher Education, 2024

The aim of teaching observations and post observation feedback in higher education is to support teachers to reflect on and improve their teaching. Yet, our understanding of tutors' (observers') and teachers' (observees') capacities for capitalising on these feedback opportunities is limited and there is little empirically derived advice for…

Descriptors: Feedback (Response), Classroom Observation Techniques, Teacher Evaluation, Multiple Literacies

A New Facets Model for Rater's Centrality/Extremity Response Style

Peer reviewed

Direct link

Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2018

The Rasch facets model was developed to account for facet data, such as student essays graded by raters, but it accounts for only one kind of rater effect (severity). In practice, raters may exhibit various tendencies such as using middle or extreme scores in their ratings, which is referred to as the rater centrality/extremity response style. To…

Descriptors: Scoring, Models, Interrater Reliability, Computation

Assessing the Assessment: Evidence of Reliability and Validity in the edTPA

Peer reviewed

Direct link

Gitomer, Drew H.; Martínez, José Felipe; Battey, Dan; Hyland, Nora E. – American Educational Research Journal, 2021

The Educative Teacher Performance Assessment (edTPA) is a system of standardized portfolio assessments of teaching performance mandated for use by educator preparation programs in 18 states, and approved in 21 others, as part of initial certification for preservice teachers. Because of the high stakes involved for examinees, it is critical that…

Descriptors: Evaluation, Performance Based Assessment, Test Reliability, Test Validity

Validation of Rating Processes within an Argument-Based Framework

Peer reviewed

Direct link

Knoch, Ute; Chapelle, Carol A. – Language Testing, 2018

Argument-based validation requires test developers and researchers to specify what is entailed in test interpretation and use. Doing so has been shown to yield advantages (Chapelle, Enright, & Jamieson, 2010), but it also requires an analysis of how the concerns of language testers can be conceptualized in the terms used to construct a…

Descriptors: Test Validity, Language Tests, Evaluation Research, Rating Scales

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 35

Educational and Psychological…	22
Research in Developmental…	15
Applied Measurement in…	10
Educational Measurement:…	9
Journal of Autism and…	9
Online Submission	9
Advances in Health Sciences…	7
Assessing Writing	6
Behavior Modification	6
Language Testing	6
Academic Medicine	5
Applied Psychological…	5
Journal of Educational…	5
Psychological Assessment	5
Assessment & Evaluation in…	4
Assessment and Evaluation in…	4
Journal of Applied Behavior…	4
Journal of Intellectual…	4
Journal of Psychoeducational…	4
Multivariate Behavioral…	4
Research in Autism Spectrum…	4
Assessment in Education:…	3
Canadian Journal of School…	3
Center for Innovation in…	3
Computers & Education	3
More ▼

Lunz, Mary E.	6
Baer, John	3
Baker, Eva L.	3
Coniam, David	3
Engelhard, George, Jr.	3
Epstein, Michael H.	3
Greatorex, Jackie	3
Jaeger, Richard M.	3
Kaufman, James C.	3
Knoch, Ute	3
Linn, Robert L.	3
Matson, Johnny L.	3
Plake, Barbara S.	3
Raymond, Mark R.	3
Tasse, Marc J.	3
Attali, Yigal	2
Borko, Hilda	2
Carter, Erik W.	2
Cason, Gerald J.	2
Cousineau, Denis	2
Du, Yi	2
Evenhuis, Heleen M.	2
Feldt, Leonard S.	2
Figueredo, Aurelio Jose	2
More ▼

National Assessment of…	7
Advanced Placement…	4
Child Behavior Checklist	4
SAT (College Admission Test)	3
Behavior Assessment System…	2
Behavioral and Emotional…	2
General Educational…	2
Stanford Achievement Tests	2
ACT Assessment	1
ACTFL Oral Proficiency…	1
Beck Depression Inventory	1
Childrens Depression Inventory	1
Conners Teacher Rating Scale	1
Developmental Behavior…	1
Diagnostic Interview Schedule…	1
Draw a Person Test	1
Early Childhood Longitudinal…	1
Family Environment Scale	1
Goal Attainment Scale	1
Graduate Record Examinations	1
Hamilton Rating Scale for…	1
International English…	1
Metropolitan Readiness Tests	1
Minnesota Multiphasic…	1
NEO Personality Inventory	1
More ▼