Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Strelan, Peter – Teaching of Psychology, 2022
Background: The concept of reliability is central to conducting--and understanding--research in Psychology. Students' understanding of concepts are strengthened when they learn by applying concepts. Objective: This article describes initial evidence of an activity for teaching reliability. Method: Students watched a short video of a staged bank…
Descriptors: Learning Activities, Psychology, Recall (Psychology), Crime
Nnamdi Chika Ezike – ProQuest LLC, 2022
Fitting wrongly specified models to observed data may lead to invalid inferences about the model parameters of interest. The current study investigated the performance of the posterior predictive model checking (PPMC) approach in detecting model-data misfit of the hierarchical rater model (HRM). The HRM is a rater-mediated model that incorporates…
Descriptors: Prediction, Models, Interrater Reliability, Item Response Theory
Mark White – Practical Assessment, Research & Evaluation, 2025
Systematized, observational approaches to measuring teaching quality are an important tool in research and practice. Termed observation systems, these approaches include a rubric that operationalizes a set of teaching quality constructs and structures to support rater training and monitoring. Scores from observation systems, through their…
Descriptors: Scores, Teacher Effectiveness, Teacher Evaluation, Observation
Breanne J. Byiers; Alyssa M. Merbler; Chantel C. Burkitt; Frank J. Symons – American Journal on Intellectual and Developmental Disabilities, 2025
Sleep problems are common in Rett syndrome and other neurogenetic syndromes. Actigraphy is a cost-effective, objective method for measuring sleep. Current guidelines require caregiver-reported bed and wake times to facilitate actigraphy data scoring. The current study examined missingness and consistency of caregiver-reported bed and wake times…
Descriptors: Sleep, Neurodevelopmental Disorders, Psychomotor Skills, Genetic Disorders
Kathryn J. Greenslade; Julia K. Bushell; Emily F. Dillon; Amy E. Ramage – International Journal of Language & Communication Disorders, 2025
Background: Pragmatic communication difficulties encompass many distinct behaviours, including the use of vague and/or insufficient language, a common characteristic following traumatic brain injury (TBI) that negatively impacts psychosocial outcomes. Existing assessments evaluate pragmatic communication broadly, often with only one or two items…
Descriptors: Neurological Impairments, Head Injuries, Language Impairments, Language Tests
Wen Xin Zhang; John J. H. Lin; Ying-Shao Hsu – Journal of Computer Assisted Learning, 2025
Background Study: Assessing learners' inquiry-based skills is challenging as social, political, and technological dimensions must be considered. The advanced development of artificial intelligence (AI) makes it possible to address these challenges and shape the next generation of science education. Objectives: The present study evaluated the SSI…
Descriptors: Artificial Intelligence, Computer Assisted Testing, Inquiry, Active Learning
Orraporn Tubtimsri; Prasong Saihong; Thanyathip Boonyiam – Journal of Education and Learning, 2025
This research aimed to develop an instructional coaching model to enhance early childhood educators' abilities in designing learning experiences for brain management. This research employed a mixed-methods approach, combining quantitative and qualitative methods and consisted of two phases: 1) developing an instructional coaching framework, and 2)…
Descriptors: Training, Curriculum Development, Learning Experience, Executive Function
Lewis, Carly A.; Myers, Carl L. – Contemporary School Psychology, 2021
Behavior rating scales are frequently used to assess social-emotional behaviors of children. While broadband behavior rating scales often measure similarly named constructs, it is unclear how consistently different instruments measure those constructs. Head Start teachers completed the preschool versions of the Behavior Assessment System for…
Descriptors: Preschool Teachers, Interrater Reliability, Child Behavior, Behavior Rating Scales
Huscroft-D'Angelo, Jacqueline; Wery, Jessica; Martin, Jodie Diane; Pierce, Corey; Crawford, Lindy – Behavioral Disorders, 2021
"The Scales for Assessing Emotional Disturbance--Third Edition Rating Scale" (SAED-3 RS; Epstein et al.) is a standardized, norm-referenced measure designed to aid in the identification process by providing useful data to professionals determining eligibility of students with an emotional disturbance (ED). Three studies are reported to…
Descriptors: Measures (Individuals), Emotional Disturbances, Test Reliability, Interrater Reliability
Belur, Jyoti; Tompson, Lisa; Thornton, Amy; Simon, Miranda – Sociological Methods & Research, 2021
A methodologically sound systematic review is characterized by transparency, replicability, and a clear inclusion criterion. However, little attention has been paid to reporting the details of interrater reliability (IRR) when multiple coders are used to make decisions at various points in the screening and data extraction stages of a study. Prior…
Descriptors: Interrater Reliability, Decision Making, Accuracy, Coding
Todaro, Francesca; Pizzorni, Nicole; Scarponi, Letizia; Ronzoni, Clara; Huckabee, Maggie-Lee; Schindler, Antonio – International Journal of Language & Communication Disorders, 2021
Background: The Test of Masticating and Swallowing Solids (TOMASS) is an international standardized swallowing assessment tool. However, its psychometric characteristics have not been analysed in patients with dysphagia. Aims: To analyse TOMASS's (1) inter- and intra-rater reliability in a clinical population of patients with dysphagia, (2)…
Descriptors: Physical Disabilities, Test Reliability, Test Validity, Standardized Tests
Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021
For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…
Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests
Solano-Flores, Guillermo – Educational Measurement: Issues and Practice, 2021
This article proposes a Boolean approach to representing and analyzing interobserver agreement in dichotomous coding. Building on the notion that observations are samples of a universe of observations, it submits that coding can be viewed as a process in which observers sample pieces of evidence on constructs. It distinguishes between formal and…
Descriptors: Online Searching, Coding, Interrater Reliability, Evidence
Kapsner-Smith, Mara R.; Opuszynski, Amanda; Stepp, Cara E.; Eadie, Tanya L. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: The reliability of auditory-perceptual judgments between listeners is a long-standing problem in the assessment of voice disorders. The purpose of this study was to determine whether a relatively novel experimental scaling method, called visual sort and rate (VSR), yielded stronger reliability than the more frequently used method of…
Descriptors: Voice Disorders, Interrater Reliability, Rating Scales, Severity (of Disability)
Palmer, Melanie; Tarver, Joanne; Carter Leno, Virginia; Paris Perez, Juan; Frayne, Margot; Slonims, Vicky; Pickles, Andrew; Scott, Stephen; Charman, Tony; Simonoff, Emily – Journal of Autism and Developmental Disorders, 2023
Emotional and behavioral problems (EBPs) frequently occur in young autistic children. Discrepancies between parents and other informants are common but can lead to uncertainty in formulation, diagnosis and care planning. This study aimed to explore child and informant characteristics are associated with reported child EBPs across settings.…
Descriptors: Observation, Emotional Disturbances, Behavior Problems, Autism Spectrum Disorders

Peer reviewed
Direct link
