Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Moeyaert, Mariola; Yang, Panpan; Xu, Xinyun; Kim, Esther – Grantee Submission, 2021
Hierarchical linear modeling (HLM) has been recommended as a meta-analytic technique for the quantitative synthesis of single-case experimental design (SCED) studies. The HLM approach is flexible and can model a variety of different SCED data complexities, such as intervention heterogeneity. A major advantage of using HLM is that participant…
Descriptors: Meta Analysis, Case Studies, Research Design, Hierarchical Linear Modeling
Kelvin Terrell Pompey – ProQuest LLC, 2021
Many methods are used to measure interrater reliability for studies where each target receives ratings by a different set of judges. The purpose of this study is to explore the use of hierarchical modeling for estimating interrater reliability using the intraclass correlation coefficient. This study provides a description of how the ICC can be…
Descriptors: Interrater Reliability, Evaluation Methods, Test Reliability, Correlation
Margaret H. Sibley; Lourdes M. Rodriguez; Melissa Lopez; Erika M. Brochu; Fabiana V. Bracho; Mercedes Ortiz; Jasmine Hashimoto – Journal of Attention Disorders, 2025
Objective: Many treatment engagement challenges are documented for adolescents with ADHD. Across contexts, helping professionals (i.e., therapists, prescribers, educators, coaches) might benefit from an engagement strategy toolbox to facilitate work with adolescents with ADHD and their families. Method: The current study describes the development…
Descriptors: Secondary School Students, Attention Deficit Hyperactivity Disorder, Parents, Parent Education
Evaluating an Explicit Instruction Teacher Observation Protocol through a Validity Argument Approach
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Journal of Experimental Education, 2022
In this study, we examined the scoring and generalizability assumptions of an explicit instruction (EI) special education teacher observation protocol using many-faceted Rasch measurement (MFRM). Video observations of classroom instruction from 48 special education teachers across four states were collected. External raters (n = 20) were trained…
Descriptors: Direct Instruction, Teacher Education, Classroom Observation Techniques, Validity
Unal, Zafer – Journal of Interactive Learning Research, 2022
Despite over fifteen years of flipped classroom implementation, current literature does not provide any reliable, standardized rubric as a guideline to create or evaluate flipped classroom lessons based on effective flipped classroom design principles. In fact, at the time of this study, when an internet search for existing rubrics was conducted,…
Descriptors: Flipped Classroom, Lesson Plans, Scoring Rubrics, Graduate Students
Research & Practice in Assessment, 2022
Meta-assessment is a useful strategy to document assessment practices and guide efforts to improve the culture of assessment at an institution. In this study, a meta-assessment of undergraduate and graduate academic program assessment reports evaluated the maturity of assessment work. Assessment reports submitted in the first year (75…
Descriptors: Program Evaluation, Educational Assessment, Meta Analysis, Undergraduate Study
Wind, Stefanie A. – Measurement: Interdisciplinary Research and Perspectives, 2022
In many performance assessments, one or two raters from the complete rater pool scores each performance, resulting in a sparse rating design, where there are limited observations of each rater relative to the complete sample of students. Although sparse rating designs can be constructed to facilitate estimation of student achievement, the…
Descriptors: Evaluators, Bias, Identification, Performance Based Assessment
Bottoms, Bryndle Laine – ProQuest LLC, 2022
Teacher evaluations are routinely conducted across the United States for licensure and professional development supports. However, there is limited research on the interrater reliability of these evaluation assessment systems, despite federal recommendations (Graham et al., 2012). This research explores the systematic approach to interrater…
Descriptors: Interrater Reliability, Early Childhood Teachers, Teacher Evaluation, Performance Based Assessment
João M. Santos – Research Evaluation, 2024
The allocation of scientific funding through grant programs is crucial for research advancement. While independent peer panels typically handle evaluations, their decisions can lean on personal preferences that go beyond the stated criteria, leading to inconsistencies and potential biases. Given these concerns, our study employs a novel method,…
Descriptors: Grants, Program Proposals, Funding Formulas, Scientific Research
Erin West; Shani Dettman – Language, Speech, and Hearing Services in Schools, 2024
Purpose: There are well-established guidelines for the recording, transcription, and analysis of spontaneous oral language samples by researchers, educators, and speech pathologists. In contrast, there is presently no consensus regarding methods for the written documentation of sign language samples. The Handshape Analysis Recording Tool (HART) is…
Descriptors: Documentation, Sign Language, Bilingual Education, Biculturalism
Emma Healy – ProQuest LLC, 2024
The shortage of autism specialists and lack of culturally sensitive autism assessment tools are helping to perpetuate racial and ethnic disparities in autism identification and treatment. Using DisCrit as a framework, this quantitative study examined the utility of one autism assessment tool, the Social Responsiveness Scale, second edition (SRS-2)…
Descriptors: Autism Spectrum Disorders, Student Evaluation, Diagnostic Tests, Disability Identification
Zoe Stephenson; Amy Jackson; Victoria Wilkes – Assessment & Evaluation in Higher Education, 2024
The closed-door PhD and doctoral viva voce--the approach adopted in the United Kingdom--is esteemed by some as being a valuable academic tradition. However, an increasing body of literature and research has raised concerns about the quality, transparency, reliability and validity of this viva format. This systematic literature review aims to…
Descriptors: Foreign Countries, Doctoral Students, Doctoral Dissertations, Persuasive Discourse
Ayse Oguz Unver; Hasan Zuhtu Okulu; Onur Bektas; Yasemin Ozdem Yilmaz; Nilay Muslu; Burcu Senler; Sertac Arabacioglu – School Science and Mathematics, 2024
Several observation protocols in different theoretical frameworks and components have been designed and validated by teacher trainers and professional development providers to capture and categorize observational data on the characteristics and level of inquiry in science practices. However, certain constraints limit their wide use, such as the…
Descriptors: Faculty Development, Classroom Observation Techniques, Science Instruction, Teaching Methods
Hunter, Seth B. – Journal of Education Human Resources, 2023
Teacher performance scores inform education leaders' management of teacher human resources. However, prior research has implied that different interpretations of performance criteria between teachers and their evaluators suppress teacher development. Although research has examined teacher perceptions of performance scores and compared teacher…
Descriptors: Teacher Evaluation, Teacher Effectiveness, Self Evaluation (Individuals), Interrater Reliability
Seedhouse, Paul; Satar, Müge – Classroom Discourse, 2023
The same L2 speaking performance may be analysed and evaluated in very different ways by different teachers or raters. We present a new, technology-assisted research design which opens up to investigation the trajectories of convergence and divergence between raters. We tracked and recorded what different raters noticed when, whilst grading a…
Descriptors: Language Tests, English (Second Language), Second Language Learning, Oral Language

Peer reviewed
Direct link
