Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Ellie Renae Bowen – ProQuest LLC, 2023
The educative Teacher Performance Assessment (edTPA) has been adopted by many state legislatures and teacher preparation programs (TPP). These states require teacher candidates to pass the edTPA with a state-specific passing score to be recommended for licensure. In the 19 states where passing the edTPA has not been required as a condition of…
Descriptors: Interrater Reliability, Teacher Evaluation, Rating Scales, Performance Based Assessment
Nicole B. Wiggs; Linda A. Reddy; Ryan Kettler; Anh Hua; Christopher Dudek; Adam Lekwa; Briana Bronstein – Assessment for Effective Intervention, 2023
The Classroom Strategies Assessment System (CSAS) is a multi-rater, multi-method (direct observation and rating scale methodology) assessment of teachers' use of research-based instructional and behavior management strategies. The present study investigated the association between teacher self-report and school administrator ratings using the CSAS…
Descriptors: Measurement Techniques, Classroom Observation Techniques, Teacher Evaluation, Teaching Methods
Chao Han; Binghan Zheng; Mingqing Xie; Shirong Chen – Interpreter and Translator Trainer, 2024
Human raters' assessment of interpreting is a complex process. Previous researchers have mainly relied on verbal reports to examine this process. To advance our understanding, we conducted an empirical study, collecting raters' eye-movement and retrospection data in a computerised interpreting assessment in which three groups of raters (n = 35)…
Descriptors: Foreign Countries, College Students, College Graduates, Interrater Reliability
Alison Cook-Sather; Ruth L. Healey – Teaching & Learning Inquiry, 2024
Peer review is widely accepted as critical to legitimating scholarly publication, and yet, it runs the risk of reproducing inequities in publishing processes and products. Acknowledging at once the historical need to legitimize SoTL publications, the current danger of reproducing exclusive practices, and the aspirational goal to "practice…
Descriptors: Peer Evaluation, Academic Language, Writing (Composition), Interrater Reliability
Julia Brochey-Taylor; Joseph A. Taylor – Educational Research and Reviews, 2024
The purpose of this synthesis study was to assess the reliability and validity of the Draw-A-Scientist Test (DAST) and its variations across multiple studies, aiming to understand limitations and propose modifications for future application within and beyond the science domain. Given the existence of multiple DAST versions, this study quantified…
Descriptors: Cognitive Tests, Freehand Drawing, Personality Measures, Projective Measures
Jonathan K. Foster; Peter Youngs; Rachel van Aswegen; Samarth Singh; Ginger S. Watson; Scott T. Acton – Journal of Learning Analytics, 2024
Despite a tremendous increase in the use of video for conducting research in classrooms as well as preparing and evaluating teachers, there remain notable challenges to using classroom videos at scale, including time and financial costs. Recent advances in artificial intelligence could make the process of analyzing, scoring, and cataloguing videos…
Descriptors: Learning Analytics, Automation, Classification, Artificial Intelligence
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Wahyu Nanda Eka Saputra; Trikinasih Handayani; Prima Suci Rohmadheny; Rohmatus Naini; Dody Hartanto; Hardi Santosa; Dewi Afra Khairunnisa; Risma Risansyah; Hanan Riati; Faturrahman – Journal of Education and Learning (EduLearn), 2025
The students are urged to do something without expecting anything in return and only in the name of God. Every islamic student becomes something ideal if they can internalize and implement sincerity. Many people are willing to do something because of an ulterior motive. The importance of sincerity in humans is the background for developing a…
Descriptors: Islam, Interrater Reliability, Prosocial Behavior, Muslims
Xiner Liu; Andres Felipe Zambrano; Ryan S. Baker; Amanda Barany; Jaclyn Ocumpaugh; Jiayi Zhang; Maciej Pankiewicz; Nidhi Nasiar; Zhanlan Wei – Journal of Learning Analytics, 2025
This study explores the potential of the large language model GPT-4 as an automated tool for qualitative data analysis by educational researchers, exploring which techniques are most successful for different types of constructs. Specifically, we assess three different prompt engineering strategies -- Zero-shot, Few-shot, and Fewshot with…
Descriptors: Coding, Artificial Intelligence, Automation, Data Analysis
Junfei Li; Jinyan Huang; Thomas Sheeran – SAGE Open, 2025
This study investigated the role of ChatGPT4o as an AI peer assessor in English-as-a-foreign-language (EFL) speaking classrooms, with a focus on its scoring reliability and the effectiveness of its feedback. The research involved 40 first-year English major students from two parallel classes at a Chinese university. Twenty from one class served as…
Descriptors: Artificial Intelligence, Technology Uses in Education, Peer Evaluation, English (Second Language)
Taylor, Tessa; Lanovaz, Marc J. – Journal of Applied Behavior Analysis, 2022
Behavior analysts typically rely on visual inspection of single-case experimental designs to make treatment decisions. However, visual inspection is subjective, which has led to the development of supplemental objective methods such as the conservative dual-criteria method. To replicate and extend a study conducted by Wolfe et al. (2018) on the…
Descriptors: Visual Perception, Artificial Intelligence, Decision Making, Evaluators
Hug, Sven E.; Ochsner, Michael – Research Evaluation, 2022
This study examines a basic assumption of peer review, namely, the idea that there is a consensus on evaluation criteria among peers, which is a necessary condition for the reliability of peer judgements. Empirical evidence indicating that there is no consensus or more than one consensus would offer an explanation for the "disagreement…
Descriptors: Peer Evaluation, Grants, Evaluation Criteria, Interrater Reliability
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Anthony, Christopher J.; Styck, Kara M.; Volpe, Robert J.; Robert, Christopher R. – School Psychology, 2023
Although originally conceived of as a marriage of direct behavioral observation and indirect behavior rating scales, recent research has indicated that Direct Behavior Ratings (DBRs) are affected by rater idiosyncrasies (rater effects) similar to other indirect forms of behavioral assessment. Most of this research has been conducted using…
Descriptors: Item Response Theory, Generalizability Theory, Interrater Reliability, Behavior Rating Scales
Bolton, Tiffany; Stevenson, Brittney; Janes, William – Journal of Occupational Therapy, Schools & Early Intervention, 2023
Researchers utilized a cross-sectional secondary analysis of data within an ongoing non-randomized controlled trial study design to establish the reliability and internal consistency of a novel handwriting assessment for preschoolers, the Just Write! (JW), written by the authors. Seventy-eight children from an area preschool participated in the…
Descriptors: Handwriting, Writing Skills, Writing Evaluation, Preschool Children

Direct link
Peer reviewed
