ERIC - Search Results

Publication Date

In 2025	38
Since 2024	106
Since 2021 (last 5 years)	367
Since 2016 (last 10 years)	889
Since 2006 (last 20 years)	2102

Descriptor

Interrater Reliability	2102
Foreign Countries	550
Correlation	343
Test Reliability	307
Evaluation Methods	275
Test Validity	267
Scores	264
Validity	256
Measures (Individuals)	250
Comparative Analysis	236
Statistical Analysis	229
Psychometrics	197
Scoring Rubrics	197
Student Evaluation	197
Teaching Methods	194
Intervention	192
Scoring	191
Observation	190
Reliability	188
Evaluators	171
Elementary School Students	158
Rating Scales	158
English (Second Language)	153
Second Language Learning	153
Questionnaires	144
More ▼

Education Level

Higher Education	548
Postsecondary Education	408
Elementary Education	274
Secondary Education	175
Early Childhood Education	140
Elementary Secondary Education	113
Middle Schools	105
High Schools	82
Preschool Education	69
Junior High Schools	64
Primary Education	54
Adult Education	52
Kindergarten	45
Intermediate Grades	39
Grade 5	38
Grade 4	36
Grade 1	33
Grade 6	33
Grade 3	31
Grade 8	30
Grade 7	27
Grade 2	23
Grade 10	13
Grade 9	11
Two Year Colleges	8
More ▼

Audience

Researchers	9
Practitioners	7
Teachers	4
Administrators	3
Counselors	2
Policymakers	1

Location

Turkey	52
Australia	48
Canada	40
China	37
United Kingdom	37
Netherlands	35
California	31
United States	28
Germany	22
Taiwan	22
Florida	20
Iran	19
Japan	19
Sweden	18
North Carolina	17
Pennsylvania	17
United Kingdom (England)	17
South Korea	16
Texas	16
New Zealand	14
Hong Kong	13
Washington	13
Georgia	12
India	12
Spain	12
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	13
Individuals with Disabilities…	6
Race to the Top	3
Elementary and Secondary…	2
American Recovery and…	1
Americans with Disabilities…	1
Education for All Handicapped…	1
Elementary and Secondary…	1
Improving Americas Schools…	1
Individuals with Disabilities…	1
Individuals with Disabilities…	1
Pell Grant Program	1
Rehabilitation Act 1973…	1
Stewart B McKinney Homeless…	1
Temporary Assistance for…	1
More ▼

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	3
Meets WWC Standards with or without Reservations	3
Does not meet standards	3

Showing 1 to 15 of 2,102 results Save | Export

Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations

Peer reviewed

Direct link

Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…

Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy

Evaluating the Correspondence between Expert Visual Analysis and Quantitative Methods

Peer reviewed

Direct link

Alexandra M. Pierce; Lisa M. H. Sanetti; Melissa A. Collier-Meek; Austin H. Johnson – Grantee Submission, 2024

Visual analysis is the primary methodology used to determine treatment effects from graphed single-case design data. Previous studies have demonstrated mixed findings related to interrater agreement between both expert and novice visual analysts, which represents a critical limitation of visual analysis and supports calls for also presenting…

Descriptors: Graphs, Interrater Reliability, Statistical Analysis, Expertise

Chasing Rainbows? Ofsted's Quest for Inter-Inspector Reliability

Peer reviewed

Direct link

Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023

Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…

Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment

A Systematic Review of Social Validation Procedures in Intervention Research with Transition-Age Autistic Youth

Peer reviewed

Direct link

Kristen Bottema-Beutel; Shannon Crowley LaPoint; So Yoon Kim; Sarah Mohiuddin; Qun Yu; Rachael McKinnon – Exceptional Children, 2024

In this secondary analysis of a previously conducted systematic review, we analyze social validity assessments in intervention research for transition-age autistic youth. Social validity is concerned with the acceptability of the intervention goals, the acceptability and feasibility of the intervention procedures, and the perceived importance of…

Descriptors: Autism Spectrum Disorders, Intervention, Validity, Psychometrics

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Reliability of a Frequency Method for Assessing Vegetable Intake Using Photos among College Students: A Smart Phone Approach

Peer reviewed

Direct link

Heena Suthar; Krisha Thiagarajah; Ibraheem Karaye; Zayra Teresa Lopez-Ixta; Trishnee Bhurosy – Journal of American College Health, 2025

Objective: To measure the interrater reliability of assessing the frequency of vegetable intake using mobile photos and descriptions. Design: Repeated measures design. Setting: A Midwestern university. Participants: Undergraduate students (N = 165). Measurable Outcome/Analysis: Number of times each of these vegetable subgroups were consumed daily:…

Descriptors: Interrater Reliability, Incidence, Food, Eating Habits

The Use of Annotations to Explain Labels: Comparing Results from a Human-Rater Approach to a Deep Learning Approach

Peer reviewed

Direct link

Lottridge, Susan; Woolf, Sherri; Young, Mackenzie; Jafari, Amir; Ormerod, Chris – Journal of Computer Assisted Learning, 2023

Background: Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about…

Descriptors: Documentation, Learning Strategies, Models, Prediction

Inconsistencies in Rater-Based Assessments Mainly Affect Borderline Candidates: But Using Simple Heuristics Might Improve Pass-Fail Decisions

Peer reviewed

Direct link

Stefan K. Schauber; Anne O. Olsen; Erik L. Werner; Morten Magelssen – Advances in Health Sciences Education, 2024

Introduction: Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that…

Descriptors: Medical Students, Performance Based Assessment, Expertise, Interrater Reliability

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Test-Retest and Inter-Rater Reliability for Selected Outcomes from a Wearable 3D Inertial Sensor over Different Stable and Unstable Postural Conditions: A Validation Study

Peer reviewed

Direct link

Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025

The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…

Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques

Development of a Categorical Scoring Codebook for Entrepreneurial Mindset (EM) Concept Maps

Peer reviewed

Direct link

Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025

Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…

Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation

Developing an Automatic Pronunciation Scorer: Aligning Speech Evaluation Models and Applied Linguistics Constructs

Peer reviewed

Direct link

Danwei Cai; Ben Naismith; Maria Kostromitina; Zhongwei Teng; Kevin P. Yancey; Geoffrey T. LaFlair – Language Learning, 2025

Globalization and increases in the numbers of English language learners have led to a growing demand for English proficiency assessments of spoken language. In this paper, we describe the development of an automatic pronunciation scorer built on state-of-the-art deep neural network models. The model is trained on a bespoke human-rated dataset that…

Descriptors: Automation, Scoring, Pronunciation, Speech Tests

Human versus Machine: The Effectiveness of ChatGPT in Automated Essay Scoring

Peer reviewed

Direct link

Jennifer Manning; Jeffrey Baldwin; Natasha Powell – Innovations in Education and Teaching International, 2025

As ChatGPT continues to reshape student engagement and instructional design, it is crucial to examine its practical implications. This study aims to evaluate the effectiveness of ChatGPT3.5 and ChatGPT4 as potential automated essay scoring (AES) systems. Fifty authentic, student-written annotated bibliographies were evaluated by three human raters…

Descriptors: Foreign Countries, Essays, Writing Evaluation, Artificial Intelligence

Using Bayesian Generalized Structural Equation Modeling to Analyze Latent Agreement

Direct link

McCluskey, Sydne – ProQuest LLC, 2023

Rater comparison analysis is commonly necessary in the social sciences. Conventional approaches to the problem generally focus on calculation of agreement statistics, which provide useful but incomplete information about rater agreement. Importantly, one-number agreement statistics give no indication regarding the nature of disagreements, nor do…

Descriptors: Bayesian Statistics, Structural Equation Models, Interrater Reliability, Beliefs

An Experimental Study of Standard Setting Methods for Diagnostic Profiles

Direct link

Feldberg, Zachary R. – ProQuest LLC, 2023

Cognitive diagnostic models (CDMs) provide pedagogically relevant information in the form of a student profile of multiple binary categorizations of students into mastery or nonmastery statuses on latent traits called attributes. Federal educational accountability requires accountability measures to designate students into one of at least three…

Descriptors: Accountability, Standards, Cutting Scores, Models

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 141

Wind, Stefanie A.	10
Johnson, Evelyn S.	8
Moylan, Laura A.	7
McLeod, Bryce D.	6
Wyse, Adam E.	6
Zheng, Yuzhu	6
Barton, Erin E.	5
Crawford, Angela R.	5
Lecavalier, Luc	5
Ledford, Jennifer R.	5
Matson, Johnny L.	5
Test, David W.	5
Aman, Michael G.	4
Attali, Yigal	4
Coniam, David	4
Conroy, Maureen A.	4
Engelhard, George, Jr.	4
Epstein, Michael H.	4
Goldhaber, Dan	4
Horner, Robert H.	4
Kern, Lee	4
Knoch, Ute	4
McIntosh, Kent	4
Petscher, Yaacov	4
Pickles, Andrew	4
More ▼

Journal Articles	1895
Reports - Research	1574
Reports - Evaluative	290
Tests/Questionnaires	128
Reports - Descriptive	98
Dissertations/Theses -…	89
Information Analyses	87
Opinion Papers	24
Speeches/Meeting Papers	23
Numerical/Quantitative Data	16
Books	4
Guides - Non-Classroom	4
Non-Print Media	3
Collected Works - General	2
Guides - General	2
Reports - General	2
Collected Works - Proceedings	1
Guides - Classroom - Teacher	1
Reference Materials - General	1
Reports -…	1
More ▼

Test of English as a Foreign…	20
Autism Diagnostic Observation…	13
Child Behavior Checklist	13
Strengths and Difficulties…	11
Vineland Adaptive Behavior…	10
Peabody Picture Vocabulary…	9
Woodcock Johnson Tests of…	9
Behavior Assessment System…	8
Dynamic Indicators of Basic…	8
National Assessment of…	7
SAT (College Admission Test)	7
Early Childhood Environment…	6
Graduate Record Examinations	6
Wechsler Intelligence Scale…	6
Classroom Assessment Scoring…	5
Draw a Person Test	5
International English…	5
Raven Progressive Matrices	5
ACT Assessment	4
Battelle Developmental…	4
Behavioral and Emotional…	4
Mullen Scales of Early…	4
Preschool Language Scale	4
Program for International…	4
Social Skills Rating System	4
More ▼

ProQuest LLC	86
Journal of Speech, Language,…	53
Journal of Autism and…	47
Grantee Submission	40
International Journal of…	33
Language Testing	31
Online Submission	28
Research in Developmental…	27
ETS Research Report Series	25
Advances in Health Sciences…	24
Assessment for Effective…	24
Assessment & Evaluation in…	23
Educational and Psychological…	20
Measurement in Physical…	20
Educational Measurement:…	17
Applied Measurement in…	16
Language Assessment Quarterly	16
Topics in Early Childhood…	16
Autism: The International…	14
Remedial and Special Education	14
Journal of Positive Behavior…	13
Journal of Psychoeducational…	13
Psychological Assessment	13
Educational Assessment	12
Journal of Educational…	12
More ▼