Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Gresse Von Wangenheim, Christiane; Da Cruz Alves, Nathalia; Rauber, Marcelo F.; Hauck, Jean C. R.; Yeter, Ibrahim H. – Informatics in Education, 2022
Although Machine Learning (ML) is used already in our daily lives, few are familiar with the technology. This poses new challenges for students to understand ML, its potential, and limitations as well as to empower them to become creators of intelligent solutions. To effectively guide the learning of ML, this article proposes a scoring rubric for…
Descriptors: Performance Based Assessment, Artificial Intelligence, Learning Processes, Scoring Rubrics
Caspari-Sadeghi, Sima; Mille, Elena; Epperlein, Hella; Forster-Heinlein, Brigitte – Mathematics Teaching Research Journal, 2022
This collaborative action research highlights the need for developing students' evaluative competence and self-reflection by embedding self-and-peer assessment into online instruction. Over the course of a semester in an online master program in mathematics and computer sciences, students conducted research on assigned topics, held presentations,…
Descriptors: Graduate Students, Masters Programs, College Mathematics, Mathematics Education
Thai, Thuy; Sheehan, Susan – Language Education & Assessment, 2022
In language performance tests, raters are important as their scoring decisions determine which aspects of performance the scores represent; however, raters are considered as one of the potential sources contributing to unwanted variability in scores (Davis, 2012). Although a great number of studies have been conducted to unpack how rater…
Descriptors: Rating Scales, Speech Communication, Second Language Learning, Second Language Instruction
McIver, Kerry L.; Brown, William H.; Pfeiffer, Karin A.; Dowda, Marsha; Pate, Russell R. – Research Quarterly for Exercise and Sport, 2016
Purpose: This study describes the development and pilot testing of the Observational System for Recording Physical Activity-Elementary School (OSRAC-E) Version. Method: This system was developed to observe and document the levels and types of physical activity and physical and social contexts of physical activity in elementary school students…
Descriptors: Elementary School Students, Physical Activities, Observation, Test Construction
Pentimonti, Jill M.; Bowles, Ryan P.; Zucker, Tricia A.; Tambyraja, Sherine R.; Justice, Laura M. – Grantee Submission, 2021
Measuring the quality of classroom-based interactive shared book reading within the early childhood classroom represents a specific dimension of teacher-child interactions that is of great interest to researchers. This interest reflects decades of research demonstrating the benefit of reading to young children in both the home and the classroom.…
Descriptors: Standardized Tests, Test Construction, Construct Validity, Predictive Validity
Barnoux, Magali; Alexander, Regi; Bhaumik, Sabyasachi; Devapriam, John; Duggan, Connor; Shepstone, Lee; Staufenberg, Ekkehart; Turner, David; Tyler, Nichola; Viding, Essi; Langdon, Peter E. – Autism: The International Journal of Research and Practice, 2020
Autistic adults who have a history of committing crimes present a major problem for providers of services in terms of legal disposal options and possible interventions, and greater understanding of this group and their associated needs is required. For this reason, we aimed to investigate the face validity of a proposed sub-typology of autistic…
Descriptors: Autism, Pervasive Developmental Disorders, Crime, Intervention
Pham, Nhung Thi Tuyet – Quality Assurance in Education: An International Perspective, 2020
Purpose: The purpose of this study is to share quality process experience from a US comprehensive university to use both direct (participation rate and assessment quality) and indirect assessment measures (assessment survey) to evaluate the quality process. Design/methodology/approach: A mixed method design was used to evaluate the quality…
Descriptors: Educational Quality, Organizational Effectiveness, Feedback (Response), Institutional Evaluation
Johnson, Evelyn S.; Zheng, Yuzhu; Crawford, Angela R.; Moylan, Laura A. – Grantee Submission, 2020
In this study, we examined the relationship of special education teachers' performance on the RESET Explicit Instruction observation protocol with student growth on academic measures. Special education teachers provided video recorded observations of three instructional lessons along with data from standardized, curriculum-based academic measures…
Descriptors: Special Education Teachers, Teacher Effectiveness, Teacher Evaluation, Direct Instruction
Bosch, Nigel; Crues, R. Wes; Shaik, Najmuddin; Paquette, Luc – Grantee Submission, 2020
Online courses often include discussion forums, which provide a rich source of data to better understand and improve students' learning experiences. However, forum messages frequently contain private information that prevents researchers from analyzing these data. We present a method for discovering and redacting private information including…
Descriptors: Privacy, Discussion Groups, Asynchronous Communication, Methods
Wind, Stefanie A.; Wolfe, Edward W.; Engelhard, George, Jr.; Foltz, Peter; Rosenstein, Mark – International Journal of Testing, 2018
Automated essay scoring engines (AESEs) are becoming increasingly popular as an efficient method for performance assessments in writing, including many language assessments that are used worldwide. Before they can be used operationally, AESEs must be "trained" using machine-learning techniques that incorporate human ratings. However, the…
Descriptors: Computer Assisted Testing, Essay Tests, Writing Evaluation, Scoring
Dempster, Edith R.; Kirby, Nicola F. – Perspectives in Education, 2018
Taxonomies of cognitive demand are frequently used to ensure that assessment tasks include questions ranging from low to high cognitive demand. This paper investigates inter-rater agreement among four evaluators on the cognitive demand of the South African National Senior Certificate Life Sciences examinations after training, practice and…
Descriptors: Interrater Reliability, Biological Sciences, Cognitive Processes, Test Items
Bijani, Houman – Cogent Education, 2018
Rater variability has always been identified as an important source of measurement error in performance assessment, especially for oral proficiency tests. Rater training is commonly used as a means for compensating various sources of rater variability and adjusting their assessment quality. However, there is little research regarding the nature of…
Descriptors: Evaluators, Training, Verbal Tests, Interrater Reliability
Wyse, Adam E. – Practical Assessment, Research & Evaluation, 2018
One common modification to the Angoff standard-setting method is to have panelists round their ratings to the nearest 0.05 or 0.10 instead of 0.01. Several reasons have been offered as to why it may make sense to have panelists round their ratings to the nearest 0.05 or 0.10. In this article, we examine one reason that has been suggested, which is…
Descriptors: Interrater Reliability, Evaluation Criteria, Scoring Formulas, Achievement Rating
Lawson, Janelle E.; Cruz, Rebecca A. – Assessment for Effective Intervention, 2018
Classroom observations remain the predominant data source used in teacher evaluations, but little is known about how rater characteristics may affect teachers' scores. For special educators, whose instructional practice requires specialized knowledge and skills, school administrators (i.e., the raters) without experience in special education…
Descriptors: Special Education Teachers, Teacher Evaluation, Interrater Reliability, Administrators
Collier-Meek, Melissa A.; Johnson, Austin H.; Farrell, Anne F. – Assessment for Effective Intervention, 2018
Implementation of research-based, Tier 1 behavior management strategies can be monitored to provide data-driven feedback and in support of integrity. The "Measure of Active Supervision and Interaction" (MASI) was developed to measure four behavior management practices (i.e., Praise, Correction, References to Behavior Expectations, Active…
Descriptors: Behavior Modification, Test Reliability, Test Validity, Interrater Reliability

Peer reviewed
Direct link
