ERIC - Search Results

Publication Date

In 2025	38
Since 2024	98
Since 2021 (last 5 years)	322
Since 2016 (last 10 years)	797
Since 2006 (last 20 years)	1895

Descriptor

Interrater Reliability	2537
Foreign Countries	602
Evaluation Methods	380
Test Reliability	374
Correlation	349
Test Validity	311
Measures (Individuals)	272
Validity	269
Comparative Analysis	262
Scores	256
Student Evaluation	236
Statistical Analysis	233
Psychometrics	218
Scoring	216
Rating Scales	205
Evaluators	203
Reliability	192
Observation	187
Intervention	185
Teaching Methods	183
Higher Education	169
Second Language Learning	168
Children	166
Scoring Rubrics	165
English (Second Language)	162
More ▼

Publication Type

Journal Articles	2537
Reports - Research	1921
Reports - Evaluative	397
Reports - Descriptive	138
Tests/Questionnaires	119
Information Analyses	109
Opinion Papers	52
Speeches/Meeting Papers	9
Numerical/Quantitative Data	5
Guides - Classroom - Teacher	3
Book/Product Reviews	2
Guides - Non-Classroom	2
Reports - General	2
Guides - General	1
More ▼

Education Level

Higher Education	516
Postsecondary Education	372
Elementary Education	237
Secondary Education	152
Early Childhood Education	117
Elementary Secondary Education	91
Middle Schools	90
High Schools	73
Preschool Education	65
Adult Education	55
Junior High Schools	53
Primary Education	39
Kindergarten	36
Grade 4	34
Grade 5	34
Intermediate Grades	33
Grade 6	30
Grade 1	29
Grade 8	25
Grade 3	23
Grade 2	21
Grade 7	21
Grade 10	11
Grade 9	10
Grade 11	7
More ▼

Audience

Researchers	65
Practitioners	25
Teachers	18
Administrators	7
Counselors	2

Location

Australia	50
Turkey	50
United Kingdom	44
Canada	41
Netherlands	37
China	35
United States	27
California	25
United Kingdom (England)	23
Taiwan	22
Germany	21
Iran	19
Sweden	19
Japan	18
Hong Kong	17
Florida	15
South Korea	15
New Zealand	14
Pennsylvania	14
South Africa	14
Israel	13
Texas	13
Finland	11
North Carolina	11
Belgium	10
More ▼

Laws, Policies, & Programs

No Child Left Behind Act 2001	8
Individuals with Disabilities…	5
Race to the Top	2
Americans with Disabilities…	1
Education for All Handicapped…	1
Elementary and Secondary…	1
Individuals with Disabilities…	1
Individuals with Disabilities…	1
Pell Grant Program	1
Rehabilitation Act 1973…	1
Stewart B McKinney Homeless…	1
Temporary Assistance for…	1
More ▼

What Works Clearinghouse Rating

Meets WWC Standards without Reservations	1
Meets WWC Standards with or without Reservations	1
Does not meet standards	2

Showing 1 to 15 of 2,537 results Save | Export

Automated Scoring in Learning Progression-Based Assessment: A Comparison of Researcher and Machine Interpretations

Peer reviewed

Direct link

Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025

Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…

Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy

Chasing Rainbows? Ofsted's Quest for Inter-Inspector Reliability

Peer reviewed

Direct link

Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023

Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…

Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment

A Systematic Review of Social Validation Procedures in Intervention Research with Transition-Age Autistic Youth

Peer reviewed

Direct link

Kristen Bottema-Beutel; Shannon Crowley LaPoint; So Yoon Kim; Sarah Mohiuddin; Qun Yu; Rachael McKinnon – Exceptional Children, 2024

In this secondary analysis of a previously conducted systematic review, we analyze social validity assessments in intervention research for transition-age autistic youth. Social validity is concerned with the acceptability of the intervention goals, the acceptability and feasibility of the intervention procedures, and the perceived importance of…

Descriptors: Autism Spectrum Disorders, Intervention, Validity, Psychometrics

Technical Adequacy-Reliability

Peer reviewed

Direct link

Susan K. Johnsen – Gifted Child Today, 2025

The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…

Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement

Reliability of a Frequency Method for Assessing Vegetable Intake Using Photos among College Students: A Smart Phone Approach

Peer reviewed

Direct link

Heena Suthar; Krisha Thiagarajah; Ibraheem Karaye; Zayra Teresa Lopez-Ixta; Trishnee Bhurosy – Journal of American College Health, 2025

Objective: To measure the interrater reliability of assessing the frequency of vegetable intake using mobile photos and descriptions. Design: Repeated measures design. Setting: A Midwestern university. Participants: Undergraduate students (N = 165). Measurable Outcome/Analysis: Number of times each of these vegetable subgroups were consumed daily:…

Descriptors: Interrater Reliability, Incidence, Food, Eating Habits

The Use of Annotations to Explain Labels: Comparing Results from a Human-Rater Approach to a Deep Learning Approach

Peer reviewed

Direct link

Lottridge, Susan; Woolf, Sherri; Young, Mackenzie; Jafari, Amir; Ormerod, Chris – Journal of Computer Assisted Learning, 2023

Background: Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about…

Descriptors: Documentation, Learning Strategies, Models, Prediction

Inconsistencies in Rater-Based Assessments Mainly Affect Borderline Candidates: But Using Simple Heuristics Might Improve Pass-Fail Decisions

Peer reviewed

Direct link

Stefan K. Schauber; Anne O. Olsen; Erik L. Werner; Morten Magelssen – Advances in Health Sciences Education, 2024

Introduction: Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that…

Descriptors: Medical Students, Performance Based Assessment, Expertise, Interrater Reliability

Examining the Psychometric Impact of Targeted and Random Double-Scoring in Mixed-Format Assessments

Peer reviewed

Direct link

Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025

Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…

Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods

Test-Retest and Inter-Rater Reliability for Selected Outcomes from a Wearable 3D Inertial Sensor over Different Stable and Unstable Postural Conditions: A Validation Study

Peer reviewed

Direct link

Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025

The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…

Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques

Development of a Categorical Scoring Codebook for Entrepreneurial Mindset (EM) Concept Maps

Peer reviewed

Direct link

Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025

Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…

Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation

Developing an Automatic Pronunciation Scorer: Aligning Speech Evaluation Models and Applied Linguistics Constructs

Peer reviewed

Direct link

Danwei Cai; Ben Naismith; Maria Kostromitina; Zhongwei Teng; Kevin P. Yancey; Geoffrey T. LaFlair – Language Learning, 2025

Globalization and increases in the numbers of English language learners have led to a growing demand for English proficiency assessments of spoken language. In this paper, we describe the development of an automatic pronunciation scorer built on state-of-the-art deep neural network models. The model is trained on a bespoke human-rated dataset that…

Descriptors: Automation, Scoring, Pronunciation, Speech Tests

Human versus Machine: The Effectiveness of ChatGPT in Automated Essay Scoring

Peer reviewed

Direct link

Jennifer Manning; Jeffrey Baldwin; Natasha Powell – Innovations in Education and Teaching International, 2025

As ChatGPT continues to reshape student engagement and instructional design, it is crucial to examine its practical implications. This study aims to evaluate the effectiveness of ChatGPT3.5 and ChatGPT4 as potential automated essay scoring (AES) systems. Fifty authentic, student-written annotated bibliographies were evaluated by three human raters…

Descriptors: Foreign Countries, Essays, Writing Evaluation, Artificial Intelligence

"Rater Training" Re-Imagined for Work-Based Assessment in Medical Education

Peer reviewed

Direct link

Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023

In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…

Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training

The Living Codebook: Documenting the Process of Qualitative Data Analysis

Peer reviewed

Direct link

Victoria Reyes; Elizabeth Bogumil; Levin Elias Welch – Sociological Methods & Research, 2024

Transparency is once again a central issue of debate across types of qualitative research. Work on how to conduct qualitative data analysis, on the other hand, walks us through the step-by-step process on how to code and understand the data we've collected. Although there are a few exceptions, less focus is on transparency regarding…

Descriptors: Qualitative Research, Data Analysis, Guides, Databases

Detecting Rater Bias in Mixed-Format Assessments

Peer reviewed

Direct link

Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024

Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…

Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 170

Educational and Psychological…	61
Journal of Speech, Language,…	61
Journal of Autism and…	56
Language Testing	37
Assessment & Evaluation in…	33
International Journal of…	33
Research in Developmental…	31
Applied Measurement in…	28
Assessment for Effective…	26
Advances in Health Sciences…	25
ETS Research Report Series	25
Journal of Educational…	24
Educational Measurement:…	23
Measurement in Physical…	20
Language Assessment Quarterly	19
Psychology in the Schools	19
Topics in Early Childhood…	19
Psychological Assessment	18
Educational Assessment	16
Grantee Submission	16
Autism: The International…	15
Journal of Consulting and…	15
Personnel Psychology	15
Journal of Intellectual…	14
Journal of Positive Behavior…	14
More ▼

Wind, Stefanie A.	10
Epstein, Michael H.	8
Ingham, Roger J.	8
Matson, Johnny L.	7
Cordes, Anne K.	6
Johnson, Robert L.	6
Lecavalier, Luc	6
McLeod, Bryce D.	6
Wyse, Adam E.	6
Aman, Michael G.	5
Barton, Erin E.	5
Coniam, David	5
Engelhard, George, Jr.	5
Greatorex, Jackie	5
Knoch, Ute	5
Ledford, Jennifer R.	5
Lunz, Mary E.	5
Tasse, Marc J.	5
Test, David W.	5
Attali, Yigal	4
Baer, John	4
Baird, Jo-Anne	4
Conroy, Maureen A.	4
Einfeld, S. L.	4
More ▼

Test of English as a Foreign…	19
Child Behavior Checklist	17
Vineland Adaptive Behavior…	14
Autism Diagnostic Observation…	13
Strengths and Difficulties…	11
Woodcock Johnson Tests of…	9
Dynamic Indicators of Basic…	8
Peabody Picture Vocabulary…	8
Wechsler Intelligence Scale…	8
Graduate Record Examinations	6
National Assessment of…	6
SAT (College Admission Test)	6
Behavior Assessment System…	5
Childhood Autism Rating Scale	5
Conners Teacher Rating Scale	5
Draw a Person Test	5
Early Childhood Environment…	5
International English…	5
Raven Progressive Matrices	5
Battelle Developmental…	4
Behavioral and Emotional…	4
Classroom Assessment Scoring…	4
Mullen Scales of Early…	4
Preschool Language Scale	4
Social Skills Rating System	4
More ▼