Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Wind, Stefanie A.; Jones, Eli – Educational Researcher, 2019
Teacher evaluation systems often include classroom observations in which raters use rating scales to evaluate teachers' effectiveness. Recently, researchers have promoted the use of multifaceted approaches to investigating reliability using Generalizability theory, instead of rater reliability statistics. Generalizability theory allows analysts to…
Descriptors: Teacher Evaluation, Observation, Generalizability Theory, Item Response Theory
Zepeda, Sally J.; Jimenez, Albert M. – Journal of Educational Supervision, 2019
Using a newly created teacher evaluation instrument, Inter-rater Reliability (IRR) analyses were conducted on four teacher videos as a means to establish instrument reliability. Raters included 42 principals and assistant principals in a southern US school district. The videos used spanned the teacher quality spectrum and the IRR findings across…
Descriptors: Teacher Evaluation, Interrater Reliability, Classroom Observation Techniques, Validity
Tülübas, Tijen; Demirkol, Murat; Ozdemir, Tuncay Yavuz; Polat, Hakan; Karakose, Turgut; Yirci, Ramazan – Educational Process: International Journal, 2023
Background/purpose: ChatGPT, a recent form of AI-based language model, have garnered interest among people from diverse backgrounds with its immersive capabilities. Using ChatGPT to support or generate scientific research has also created an ongoing debate over its advantages versus risks. The present study aimed to conduct an AI-enabled research…
Descriptors: Artificial Intelligence, Emergency Programs, Distance Education, COVID-19
Yan, Xun; Chuang, Ping-Lin – Language Testing, 2023
This study employed a mixed-methods approach to examine how rater performance develops during a semester-long rater certification program for an English as a Second Language (ESL) writing placement test at a large US university. From 2016 to 2018, we tracked three groups of novice raters (n = 30) across four rounds in the certification program.…
Descriptors: Evaluators, Interrater Reliability, Item Response Theory, Certification
Kevin Hirschi; Okim Kang – Language Teaching Research Quarterly, 2023
This paper extends the use of Generalizability Theory to the measurement of extemporaneous L2 speech through the lens of speech perception. Using six datasets of previous studies, it reports on "G studies"--a method of breaking down measurement variance--and "D studies"--a predictive study of the impact on reliability when…
Descriptors: Evaluators, Generalization, Evaluation Methods, Speech Communication
Amanda Simpfenderfer; Peter Knox; Bernice Garnett; Lance Smith; Colby Kervick; Mika Moore; Karyn Vogel – Society for Research on Educational Effectiveness, 2023
Background/Context: Recently there has been increased emphasis on disaggregating students' experiences collected through surveys based on student identities (Arredondo, 2016). Yet demographic data collection is problematic by nature, often reducing the complexity of an individual's racial, gender, or sexual identity to a single category. Racial…
Descriptors: Elementary School Students, High School Students, Middle School Students, Self Concept
Nassrallah, Flora; Tang, Ken; Whittingham, JoAnne; Sun, Huidan; Fitzpatrick, Elizabeth M. – Journal of Deaf Studies and Deaf Education, 2020
This study explored the impact of mild bilateral or unilateral hearing loss on auditory, social, and behavior skills in early school-aged children. Thirty-two children (aged 5-9 years) were evaluated with parent and teacher questionnaires. Most outcomes were within the range of expected scores. However, functional auditory skills were below…
Descriptors: Hearing Impairments, Mild Disabilities, Children, Interpersonal Competence
Kocakulah, Aysel – Participatory Educational Research, 2022
The aim of this study is to develop and apply a rubric to evaluate the solutions proposed for questions about electromagnetic induction belonging to university second year pre-service teachers. In this study which has pretest-posttest quasi-experimental design with control group, teaching of the topic of electromagnetic induction was applied to…
Descriptors: Scoring Rubrics, Student Evaluation, Undergraduate Students, Problem Solving
Hitt, Sara Beth; Kwiatek, Stephen; Voggt, Ashley; Chang, Wen-hsuan; Gadd, Sonja; Test, David W. – Journal of Special Education, 2022
Because many websites claim to provide information about evidence-based practices (EBPs), consumers must know the information and practices are based upon quality research. Practitioners may intend to locate trustworthy online sources providing EBPs, but if those sources are not easy to navigate and lack implementation resources (i.e., are…
Descriptors: Web Sites, Information Sources, Evidence Based Practice, Credibility
York, Wesley Ralph – ProQuest LLC, 2022
The purpose of this study is to develop and validate instrument-specific rating scales to evaluate the classroom performances of middle and high school students. The study is guided by the following research questions: 1. What does Rasch measurement analysis reveal about the psychometric properties (i.e., validity and reliability) of items,…
Descriptors: Test Construction, Test Validity, Musical Instruments, Rating Scales
Semenzin, Chiara; Hamrick, Lisa; Seidl, Amanda; Kelleher, Bridgette L.; Cristia, Alejandrina – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Recording young children's vocalizations through wearables is a promising method to assess language development. However, accurately and rapidly annotating these files remains challenging. Online crowdsourcing with the collaboration of citizen scientists could be a feasible solution. In this article, we assess the extent to which citizen…
Descriptors: Young Children, Audio Equipment, Documentation, Speech
Li, Hongxia; Zhao, ChengLing; Long, Taotao; Huang, Yan; Shu, Fengfang – British Journal of Educational Technology, 2021
As an innovative evaluation tool, peer assessment is essential in Massive Open Online Courses (MOOCs). In both formative and summative peer assessments in MOOCs, providing reliable feedback is crucial in enhancing learning outcomes. Peer assessment has been highlighted as a reliable tool in both traditional classrooms and small-scale online…
Descriptors: Peer Evaluation, Online Courses, Open Education, Feedback (Response)
Dankiw, Kylie A.; Baldock, Katherine L.; Kumar, Saravana; Tsiros, Margarita D. – Australasian Journal of Early Childhood, 2021
Identifying and describing children's play behaviours is an important component of evaluating child development. The Behaviour Mapping Schedule is a direct observational tool which aims to describe and quantify children's play behaviours but is yet to undergo reliability testing. This study aimed to determine the intra- and inter-rater reliability…
Descriptors: Interrater Reliability, Classification, Child Behavior, Play
Barron, Becky F.; Paliliunas, Dana; Dixon, Mark R. – Journal of Behavioral Education, 2021
The purpose of the present study was to examine the relationship between the PEAK Direct Training Pre-assessment (PEAK-DT-PA) and the PEAK Generalization Pre-assessment (PEAK-G-PA) with the corresponding indirect assessments based on parent and therapist reports. Participants were administered the PEAK-DT-PA and PEAK-G-PA. Parents and therapist…
Descriptors: Training, Learning Modules, Student Evaluation, Parents
Starmer, Heather M.; Arrese, Loni; Langmore, Susan; Ma, Yifei; Murray, Joseph; Patterson, Joanne; Pisegna, Jessica; Roe, Justin; Tabor-Gray, Lauren; Hutcheson, Katherine – Journal of Speech, Language, and Hearing Research, 2021
Purpose: While flexible endoscopic evaluation of swallowing (FEES) is a common clinical procedure used in the head and neck cancer (HNC) population, extant outcome measures for FEES such as bolus-level penetration-aspiration and residue scores are not well suited as global patient-level endpoint measures of dysphagia severity in cooperative group…
Descriptors: Medical Evaluation, Physical Disabilities, Safety, Efficiency

Peer reviewed
Direct link
