Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Hunn, Lorie L. – ProQuest LLC, 2009
This study explored and compared the ways in which school-based cooperating teachers and college supervisors evaluate student teachers. The scores allocated to student teachers by school-based cooperating teachers and college supervisors in the final field experience evaluations of student teachers were analyzed. A mixed methods research design…
Descriptors: Cooperating Teachers, Leadership, Research Design, Student Teachers
Chiat, Shula; Roy, Penny – Journal of Speech, Language, and Hearing Research, 2007
Purpose: To determine the psychometric properties of the Preschool Repetition (PSRep) Test (Roy & Chiat, 2004), to establish the range of performance in typically developing children and variables affecting this performance, and to compare the performance of clinically referred children. Method: The PSRep Test comprises 18 words and 18…
Descriptors: Phonology, Psychometrics, Interrater Reliability, Followup Studies
Anseel, Frederik; Lievens, Filip – Journal of Career Development, 2007
This study examines how feedback interest after career assessment can be influenced by changing individuals' beliefs about the importance and modifiability of the various performance dimensions. In an experiment, 82 master students completed a computerized assessment tool developed for assessing managerial potential. Results showed that…
Descriptors: Feedback (Response), Interrater Reliability, Counselors, Career Counseling
McCandless, Stephen; O'Laughlin, Liz – Journal of Attention Disorders, 2007
Objective: Current theories hypothesize that deficits in executive functioning (EF) are responsible for the symptoms of ADHD and that specific patterns of EF deficits may be associated with different subtypes of ADHD. The present study evaluates the validity and clinical usefulness of the Behavior Rating Inventory of Executive Function, a behavior…
Descriptors: Test Validity, Interrater Reliability, Attention Deficit Disorders, Rating Scales
Howard, Donna Elise; Griffin, Melinda; Boekeloo, Bradley; Lake, Kristin; Bellows, Denise – Journal of American College Health, 2007
Objective: In this qualitative study, the authors examined how students attempt to minimize harm to themselves and others when drinking. Participants: The authors recruited freshmen at a large, mid-Atlantic US public university during the fall semester of 2005 to participate in 8 focus groups. Methods: The moderator's guide was developed through…
Descriptors: College Freshmen, Focus Groups, Interrater Reliability, Coping
Burt, Tammy L.; Porretta, David L.; Klein, Richard E. – Education and Training in Developmental Disabilities, 2007
This study investigated the use of adapted bicycles on the acquisition, maintenance, and generalization of conventional cycling by seven children with mild mental retardation. Feedback was used in addition to the adapted bicycles and consisted of pedal rate, head position, and steering participation. A multiple probe design was used. Participants…
Descriptors: Mild Mental Retardation, Maintenance, Generalization, Elementary School Students
Sorcinelli, Andrea; Shaw, Lynn; Freeman, Andrew; Cooper, Kim – Canadian Journal on Aging, 2007
Purpose: The purpose of this study was to evaluate the utility and reliability of a home hazard checklist published in Health Canada, "The Safe Living Guide: A Guide to Home Safety for Seniors" (2003). Methods: 76 community-dwelling seniors evaluated the guide, and inter-rater reliability was determined through comparison of ratings of…
Descriptors: Foreign Countries, Check Lists, Caregivers, Independent Living
Peer reviewedNewman, Jody L.; Fuqua, Dale R. – Counselor Education and Supervision, 1986
Examined the effects of order of stimulus presentation on observer ratings of counseling performance. Results revealed a statistically significant interaction between quality of performance and the order in which the performances were rated. (Author/ABB)
Descriptors: Counselor Evaluation, Counselor Performance, Interrater Reliability, Observation
Peer reviewedAnsorge, Charles J.; Scheer, John K. – Research Quarterly for Exercise and Sport, 1988
Analysis of gymnastics judges scores of their own and other countries' gymnasts' performance during the 1984 Olympic Games indicated that the judges were biased in favor of their own country's gymnasts. (Author/CB)
Descriptors: Bias, Competition, Gymnastics, International Relations
Peer reviewedKane, Robert L.; And Others – Journal of Consulting and Clinical Psychology, 1987
Three experienced neuropsychologists rated brain damaged and control subjects for brain damage using the Halstead-Reitan Battery and the Luria-Nebraska Neuropsychological Battery. Using either battery, raters were accurate in judging the presence of brain damage. There was a high degree of consistency between raters and test batteries when both…
Descriptors: Interrater Reliability, Neurological Impairments, Psychological Testing, Psychometrics
Peer reviewedCicchetti, Domenic V.; And Others – Educational and Psychological Measurement, 1984
This program computes multiple judge reliability levels under the following conditions. (1) different sets of judges perform the ratings; (2) the number of judges is a constant; and (3) the scale of measurement is nominal. (Author)
Descriptors: Computer Software, Interrater Reliability, Judgment Analysis Technique, Test Reliability
Peer reviewedVance, B.; And Others – Psychology in the Schools, 1983
Investigated the interscorer reliability between a novice and a professional psychologist for the Minnesota Percepto-Diagnostic Test-Revised (MPDT-R), using a sample of 30 individuals. Results indicated that for three of the four MPDT-R scores there was a significant positive correlation between expert and novice scoring criteria. (JAC)
Descriptors: Experimenter Characteristics, Interrater Reliability, Psychological Evaluation, Psychologists
Randolph, Justus J. – Online Submission, 2005
Fleiss' popular multirater kappa is known to be influenced by prevalence and bias, which can lead to the paradox of high agreement but low kappa. It also assumes that raters are restricted in how they can distribute cases across categories, which is not a typical feature of many agreement studies. In this article, a free-marginal, multirater…
Descriptors: Multivariate Analysis, Statistical Distributions, Statistical Bias, Interrater Reliability
Peer reviewedBartfay, Emma – International Journal of Testing, 2003
Used Monte Carlo simulation to compare the properties of a goodness-of-fit (GOF) procedure and a test statistic developed by E. Bartfay and A. Donner (2001) to the likelihood ratio test in assessing the existence of extra variation. Results show the GOF procedure possess satisfactory Type I error rate and power. (SLD)
Descriptors: Goodness of Fit, Interrater Reliability, Monte Carlo Methods, Simulation
Peer reviewedVanLeeuwen, Dawn M. – Journal of Agricultural Education, 1997
Generalizability Theory can be used to assess reliability in the presence of multiple sources and different types of error. It provides a flexible alternative to Classical Theory and can handle estimation of interrater reliability with any number of raters. (SK)
Descriptors: Error of Measurement, Generalizability Theory, Interrater Reliability, Measurement Techniques

Direct link
