ERIC - Search Results

Publication Date

In 2025	38
Since 2024	106

Descriptor

Interrater Reliability	106
Foreign Countries	37
Evaluation Methods	25
Test Validity	20
Scoring Rubrics	19
Test Reliability	17
Artificial Intelligence	15
Evaluation Criteria	14
Evaluators	13
Scoring	13
Student Evaluation	13
Scores	12
Undergraduate Students	12
Accuracy	11
College Students	11
Error of Measurement	11
Psychometrics	10
Autism Spectrum Disorders	8
English (Second Language)	8
Measurement Techniques	8
Peer Evaluation	8
Writing Evaluation	8
Reliability	7
Second Language Learning	7
Video Technology	7
More ▼

Publication Type

Journal Articles	98
Reports - Research	93
Information Analyses	7
Tests/Questionnaires	5
Dissertations/Theses -…	4
Reports - Descriptive	4
Reports - Evaluative	3

Education Level

Higher Education	40
Postsecondary Education	40
Secondary Education	10
Elementary Education	8
Middle Schools	5
Early Childhood Education	4
High Schools	4
Junior High Schools	4
Grade 4	2
Grade 7	2
Intermediate Grades	2
Adult Education	1
Grade 1	1
Grade 2	1
Grade 3	1
Grade 6	1
Preschool Education	1
Primary Education	1
More ▼

Audience

Researchers

Location

China	7
Germany	3
Australia	2
Canada	2
Finland	2
Indonesia	2
South Korea	2
Texas	2
Virginia	2
Belgium	1
Colorado	1
District of Columbia	1
Egypt	1
Florida	1
Illinois (Urbana)	1
Ireland	1
Israel	1
Italy	1
Maryland	1
Massachusetts	1
New York (Syracuse)	1
New Zealand	1
North Carolina	1
Norway	1
Oregon	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Classroom Assessment Scoring…	3
ACT Assessment	1
Autism Diagnostic Observation…	1
Behavior Assessment System…	1
Draw a Person Test	1
Early Childhood Environment…	1
Mullen Scales of Early…	1
SAT (College Admission Test)	1
Social Responsiveness Scale	1
Strengths and Difficulties…	1
Vineland Adaptive Behavior…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 106 results Save | Export

Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations

Peer reviewed

Direct link

Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…

Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy

Evaluating the Correspondence between Expert Visual Analysis and Quantitative Methods

Peer reviewed

Direct link

Alexandra M. Pierce; Lisa M. H. Sanetti; Melissa A. Collier-Meek; Austin H. Johnson – Grantee Submission, 2024

Visual analysis is the primary methodology used to determine treatment effects from graphed single-case design data. Previous studies have demonstrated mixed findings related to interrater agreement between both expert and novice visual analysts, which represents a critical limitation of visual analysis and supports calls for also presenting…

Descriptors: Graphs, Interrater Reliability, Statistical Analysis, Expertise

A Systematic Review of Social Validation Procedures in Intervention Research with Transition-Age Autistic Youth

Peer reviewed

Direct link

Kristen Bottema-Beutel; Shannon Crowley LaPoint; So Yoon Kim; Sarah Mohiuddin; Qun Yu; Rachael McKinnon – Exceptional Children, 2024

In this secondary analysis of a previously conducted systematic review, we analyze social validity assessments in intervention research for transition-age autistic youth. Social validity is concerned with the acceptability of the intervention goals, the acceptability and feasibility of the intervention procedures, and the perceived importance of…

Descriptors: Autism Spectrum Disorders, Intervention, Validity, Psychometrics

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Reliability of a Frequency Method for Assessing Vegetable Intake Using Photos among College Students: A Smart Phone Approach

Peer reviewed

Direct link

Heena Suthar; Krisha Thiagarajah; Ibraheem Karaye; Zayra Teresa Lopez-Ixta; Trishnee Bhurosy – Journal of American College Health, 2025

Objective: To measure the interrater reliability of assessing the frequency of vegetable intake using mobile photos and descriptions. Design: Repeated measures design. Setting: A Midwestern university. Participants: Undergraduate students (N = 165). Measurable Outcome/Analysis: Number of times each of these vegetable subgroups were consumed daily:…

Descriptors: Interrater Reliability, Incidence, Food, Eating Habits

Inconsistencies in Rater-Based Assessments Mainly Affect Borderline Candidates: But Using Simple Heuristics Might Improve Pass-Fail Decisions

Peer reviewed

Direct link

Stefan K. Schauber; Anne O. Olsen; Erik L. Werner; Morten Magelssen – Advances in Health Sciences Education, 2024

Introduction: Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that…

Descriptors: Medical Students, Performance Based Assessment, Expertise, Interrater Reliability

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Test-Retest and Inter-Rater Reliability for Selected Outcomes from a Wearable 3D Inertial Sensor over Different Stable and Unstable Postural Conditions: A Validation Study

Peer reviewed

Direct link

Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025

The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…

Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques

Development of a Categorical Scoring Codebook for Entrepreneurial Mindset (EM) Concept Maps

Peer reviewed

Direct link

Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025

Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…

Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation

Developing an Automatic Pronunciation Scorer: Aligning Speech Evaluation Models and Applied Linguistics Constructs

Peer reviewed

Direct link

Danwei Cai; Ben Naismith; Maria Kostromitina; Zhongwei Teng; Kevin P. Yancey; Geoffrey T. LaFlair – Language Learning, 2025

Globalization and increases in the numbers of English language learners have led to a growing demand for English proficiency assessments of spoken language. In this paper, we describe the development of an automatic pronunciation scorer built on state-of-the-art deep neural network models. The model is trained on a bespoke human-rated dataset that…

Descriptors: Automation, Scoring, Pronunciation, Speech Tests

Human versus Machine: The Effectiveness of ChatGPT in Automated Essay Scoring

Peer reviewed

Direct link

Jennifer Manning; Jeffrey Baldwin; Natasha Powell – Innovations in Education and Teaching International, 2025

As ChatGPT continues to reshape student engagement and instructional design, it is crucial to examine its practical implications. This study aims to evaluate the effectiveness of ChatGPT3.5 and ChatGPT4 as potential automated essay scoring (AES) systems. Fifty authentic, student-written annotated bibliographies were evaluated by three human raters…

Descriptors: Foreign Countries, Essays, Writing Evaluation, Artificial Intelligence

The Living Codebook: Documenting the Process of Qualitative Data Analysis

Peer reviewed

Direct link

Victoria Reyes; Elizabeth Bogumil; Levin Elias Welch – Sociological Methods & Research, 2024

Transparency is once again a central issue of debate across types of qualitative research. Work on how to conduct qualitative data analysis, on the other hand, walks us through the step-by-step process on how to code and understand the data we've collected. Although there are a few exceptions, less focus is on transparency regarding…

Descriptors: Qualitative Research, Data Analysis, Guides, Databases

Detecting Rater Bias in Mixed-Format Assessments

Peer reviewed

Direct link

Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024

Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…

Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses

Intercoder Reliability for Use in Qualitative Research and Evaluation

Peer reviewed

Direct link

Monica L. Coleman; Moira Ragan; Tahani Dari – Measurement and Evaluation in Counseling and Development, 2024

Intercoder reliability can increase trustworthiness, accuracy, rigor, collaboration, and power sharing in qualitative research. Though not every qualitative design can utilize intercoder reliability, this article highlights how positivist qualitative research, community-based participatory research, and participatory evaluation all strengthen when…

Descriptors: Interrater Reliability, Qualitative Research, Counseling, Research

Procedural Fidelity Reporting in "The Analysis of Verbal Behavior" from 2007-2021

Peer reviewed

Direct link

Elizabeth J. Preas; Mary E. Halbur; Regina A. Carroll – Analysis of Verbal Behavior, 2024

Procedural fidelity refers to the degree to which procedures for an assessment or intervention (i.e., independent variables) are implemented consistent with the prescribed protocols. Procedural fidelity is an important factor in demonstrating the internal validity of an experiment and clinical treatments. Previous reviews evaluating the inclusion…

Descriptors: Verbal Communication, Behavioral Science Research, Periodicals, Fidelity

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8

Journal of Speech, Language,…	5
Journal of Baltic Science…	4
ProQuest LLC	4
Assessment for Effective…	3
International Journal of…	3
Language Testing	3
Advances in Physiology…	2
Assessment & Evaluation in…	2
Assessment Update	2
Educational Assessment	2
Educational Measurement:…	2
Journal of Learning Analytics	2
Measurement in Physical…	2
Measurement:…	2
Sociological Methods &…	2
Action in Teacher Education	1
Active Learning in Higher…	1
Advances in Health Sciences…	1
American Journal on…	1
Analysis of Verbal Behavior	1
Applied Measurement in…	1
Asia-Pacific Education…	1
Autism: The International…	1
Behavioral Disorders	1
British Educational Research…	1
More ▼

Stefanie A. Wind	3
Iasonas Lamprianou	2
Mark White	2
Reeta Neittaanmäki	2
Yangmeng Xu	2
Aaron Olaf Batty	1
Aaron Zimmerman	1
Ahmed Alkhateeb	1
Ahmet Volkan Yüzüak	1
Aida Carballo-Fazanes	1
Aislinn Ganci	1
Akhila Shibu	1
Alessia Battisti	1
Alexander Naumann	1
Alexandra Jackson	1
Alexandra M. Pierce	1
Ali Kilicarslan	1
Alisha Demchak	1
Alison Cook-Sather	1
Alyssa M. Merbler	1
Amanda Barany	1
Amanda Huee-Ping Wong	1
Amelia Krysinski	1
Amy E. Ramage	1
Amy Jackson	1
More ▼