Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Han, Jiantao; Long, Haiying; Pang, Weiguo – Creativity Research Journal, 2017
This study reported 2 experiments that studied the effect of perspective taking on assessment of creative products by using human raters. Forty responses of 2 alternative uses tasks (AUTs) and 15 alien stories generated by 6th-grade students were used as assessment materials. Undergraduate students as the novice raters assessed the products under…
Descriptors: Perspective Taking, Creativity, Undergraduate Students, Psychology
Ballard, Laura – ProQuest LLC, 2017
Rater scoring has an impact on writing test reliability and validity. Thus, there has been a continued call for researchers to investigate issues related to rating (Crusan, 2015). Investigating the scoring process and understanding how raters arrive at particular scores are critical "because the score is ultimately what will be used in making…
Descriptors: Evaluators, Schemata (Cognition), Eye Movements, Scoring Rubrics
Tucker, Laura; Scherr, Rachel E.; Zickler, Todd; Mazur, Eric – Physical Review Physics Education Research, 2016
Large-scale audiovisual data that measure group learning are time consuming to collect and analyze. As an initial step towards scaling qualitative classroom observation, we qualitatively coded classroom video using an established coding scheme with and without its audio cues. We find that interrater reliability is as high when using visual data…
Descriptors: Observation, Coding, Video Technology, Visual Stimuli
Kokkinaki, Theano – Early Child Development and Care, 2019
We compared systematically the structure, the focus, the thematic sequences, the complexity and the syntactic properties between maternal and paternal infant-directed speech in engagements of infants with their mothers and fathers. Eleven mother-infant and 11 father-infant dyads were video-recorded during their natural interactions at home from…
Descriptors: Infants, Mothers, Fathers, Parent Child Relationship
Al-Harthi, Aisha Salim Ali; Campbell, Chris; Karimi, Arafeh – Computers in the Schools, 2018
This study aimed to develop, validate, and trial a rubric for evaluating the cloud-based learning designs (CBLD) that were developed by teachers using virtual learning environments. The rubric was developed using the technological pedagogical content knowledge (TPACK) framework, with rubric development including content and expert validation of…
Descriptors: Computer Assisted Instruction, Scoring Rubrics, Interrater Reliability, Content Validity
Tymms, Peter; Higgins, Steve – Studies in Higher Education, 2018
The United Kingdom's (UK's) Research Excellence Framework of 2014 was an expensive high stakes evaluation which had a range of impacts on higher education institutions across the country. One component was an assessment of the quality of research outputs where a major feature was a series of panels organised to read and rate the outputs of their…
Descriptors: Research Reports, Educational Research, Journal Articles, Teacher Researchers
Schaefer, John M.; Ottley, Jennifer R. – Journal of Special Education Technology, 2018
Observation and performance feedback to support traditional training methods are central tools for preservice practitioner preparation and in-service practitioner professional development. Research highlights how some specific characteristics of feedback (e.g., the latency between behavior and feedback) can impact the effectiveness. One method of…
Descriptors: Evidence Based Practice, Feedback (Response), Faculty Development, Educational Technology
Jensen, Bryant; Grajeda, Sara; Haertel, Edward – Educational Assessment, 2018
We trace the development and analyze the generalizability of the Classroom Assessment of Sociocultural Interactions (CASI), an observation system designed to measure cultural dimensions of classroom interactions. We establish CASI measurement properties by analyzing panoramic videos of 4th and 5th grade classrooms from the Measures of Effective…
Descriptors: Classroom Observation Techniques, Grade 4, Grade 5, Error of Measurement
Thompson, Andrew R.; O'Loughlin, Valerie D. – Anatomical Sciences Education, 2015
Bloom's taxonomy is a resource commonly used to assess the cognitive level associated with course assignments and examination questions. Although widely utilized in educational research, Bloom's taxonomy has received limited attention as an analytical tool in the anatomical sciences. Building on previous research, the Blooming Anatomy Tool (BAT)…
Descriptors: Anatomy, Classification, Scoring Rubrics, Multiple Choice Tests
Morrish, Taryn; Nesbitt, Amy; le Roux, Mia; Zsilavecz, Ursula; van der Linde, Jeannie – Communication Disorders Quarterly, 2017
Research involving stuttering in multilingual individuals is limited. Speech-language therapists face the challenge of treating a diverse client base, which includes multilingual individuals. The aim of this study was to examine the stuttering moments across English, Afrikaans, and German in a multilingual speaker. A single multilingual adult with…
Descriptors: Foreign Countries, Stuttering, Multilingualism, Case Studies
Cobb, Jeanne B. – Journal of Research in Childhood Education, 2017
This article describes a descriptive study utilizing a picture protocol technique that integrated the use of photographs of good readers and children's representational drawings with informal conversations about their habits and behaviors before, during, and after reading. The research participants included 228 children in kindergarten through 5th…
Descriptors: Reading Strategies, Metacognition, Elementary School Students, Photography
Ziegler, Wolfram; Staiger, Anja; Schölderle, Theresa; Vogel, Mathias – Journal of Speech, Language, and Hearing Research, 2017
Purpose: Standardized clinical assessment of dysarthria is essential for management and research. We present a new, fully standardized dysarthria assessment, the Bogenhausen Dysarthria Scales (BoDyS). The measurement model of the BoDyS is based on auditory evaluations of connected speech using 9 scales (traits) assessed by 4 elicitation methods.…
Descriptors: Auditory Evaluation, Test Reliability, Test Validity, Rating Scales
Bloch, Steven; Tuomainen, Jyrki – International Journal of Language & Communication Disorders, 2017
Background: The Dysarthria-in-Interaction Profile's potential contribution to the clinical assessment of dysarthria-in-conversation has been outlined in the literature, but its consistency of use across different users has yet to be reported. Aims: To establish the level of consistency across raters on four different interaction categories. That…
Descriptors: Articulation Impairments, Neurological Impairments, Augmentative and Alternative Communication, Profiles
Lin, Chih-Kai – Language Testing, 2017
Sparse-rated data are common in operational performance-based language tests, as an inevitable result of assigning examinee responses to a fraction of available raters. The current study investigates the precision of two generalizability-theory methods (i.e., the rating method and the subdividing method) specifically designed to accommodate the…
Descriptors: Data Analysis, Language Tests, Generalizability Theory, Accuracy
Bajwa, Nadia M.; Yudkowsky, Rachel; Belli, Dominique; Vu, Nu Viet; Park, Yoon Soo – Advances in Health Sciences Education, 2017
The purpose of this study was to provide validity and feasibility evidence in measuring professionalism using the Professionalism Mini-Evaluation Exercise (P-MEX) scores as part of a residency admissions process. In 2012 and 2013, three standardized-patient-based P-MEX encounters were administered to applicants invited for an interview at the…
Descriptors: Graduate Medical Education, College Admission, College Entrance Examinations, Validity

Peer reviewed
Direct link
