Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedFagot, Robert F. – Psychometrika, 1994
A previous paper proposed a generalized family of coefficients of relational agreement for multiple judges. It focused on the concept of empirically meaningful relationships. This paper presents an ordinal coefficient of relational agreement as a special case of the generalized family. The proposed ordinal coefficient encompasses other ordinal…
Descriptors: Correlation, Effect Size, Equations (Mathematics), Evaluators
Peer reviewedNewstead, Stephen E.; Dennis, Ian – Assessment and Evaluation in Higher Education, 1990
Three studies investigating the existence of sex bias in the grading of undergraduate students, by examining interrater reliability for blind and non-blind grading, are reported. Negative evidence found in the results and the confusing picture presented by previous research indicate little firm evidence of sex bias in grading. (Author/MSE)
Descriptors: Evaluation Methods, Grading, Higher Education, Interrater Reliability
Ottenbacher, Kenneth J.; Cusick, Anne – Journal of the Association for Persons with Severe Handicaps (JASH), 1991
The study, with 79 rehabilitation therapists evaluating 21 single-subject graphs, found that the low interrater agreement often associated with visual analysis of single-subject data may be improved by simple supplements (such as trend lines) to visually inspected charts. (Author/DB)
Descriptors: Case Studies, Data Analysis, Disabilities, Evaluation Methods
Peer reviewedWard, Sandra B.; And Others – Journal of School Psychology, 1991
Investigated referral question bias on school psychologists' classification decisions across different types of cases. Findings from 175 school psychologists who classified 5 case studies on basis of scores from intelligence, achievement, and behavioral measures revealed lack of congruence among respondents' classification decisions that was more…
Descriptors: Classification, Congruence (Psychology), Educational Diagnosis, Elementary Secondary Education
Peer reviewedBaird, Jo-Anne – Educational Research, 1998
Advanced-level English and chemistry examinations in Britain were graded in several conditions: with or without examinees' names; with male or female names; and with "male" or "female" handwriting. No consistent evidence of gender bias was found in the marking. (SK)
Descriptors: Ethnic Bias, Examiners, Foreign Countries, Grading
Peer reviewedVanSciver, James H. – ERS Spectrum, 1998
The Delaware Performance Appraisal Process includes indicators that contain subjective terminology and are open to various interpretations, leading to concerns about interrater reliability and professional staff morale. The Lake Forest School District developed a two-pronged approach that retains the three-observation method while eliminating the…
Descriptors: Elementary Secondary Education, Interrater Reliability, Measurement Techniques, Objectivity
Peer reviewedSaunders, Mark N. K.; Davis, Susan M. – Quality Assurance in Education, 1998
Lecturers at a British university participated in two workshops to examine the consistency of assessments of undergraduates' work. Use of both analytical and global quality measures, when clearly understood by the raters, improved assessment practices. Ongoing discussion of evaluation criteria was recommended. (SK)
Descriptors: Business Education, Evaluation Criteria, Foreign Countries, Grading
Peer reviewedHurtz, Gregory M.; Hertz, Norman R. – Educational and Psychological Measurement, 1999
Evaluated Angoff ratings from eight different occupational licensing examinations through generalizability theory to estimate the optimal number of raters. Results indicate that approximately 10 to 15 raters is an optimal target range. (SLD)
Descriptors: Cutting Scores, Evaluators, Generalizability Theory, Interrater Reliability
Chu, Brian C.; Kendall, Philip C. – Journal of Consulting and Clinical Psychology, 2004
Ratings of child involvement in manual-based cognitive-behavioral treatment for anxiety were associated with the absence of primary anxiety diagnosis and reductions in impairment ratings at posttreatment for 59 children with anxiety (ages 8-14 years). Good-to-excellent interrater reliability was established for the independent ratings of 237…
Descriptors: Psychometrics, Psychotherapy, Anxiety, Outcomes of Treatment
Schuster, Christof – Educational and Psychological Measurement, 2004
This article presents a formula for weighted kappa in terms of rater means, rater variances, and the rater covariance that is particularly helpful in emphasizing that weighted kappa is an absolute agreement measure in the sense that it is sensitive to differences in rater's marginal distributions. Specifically, rater mean differences will decrease…
Descriptors: Computation, Rating Scales, Interrater Reliability, Statistical Analysis
King, Keith; Laake, Rebecca A.; Bernard, Amy – American Journal of Health Education, 2006
This study examined the sexual messages depicted in music videos aired on MTV, MTV2, BET, and GAC from August 2, 2004 to August 15, 2004. One-hour segments of music videos were taped daily for two weeks. Depictions of sexual attire and sexual behavior were analyzed via a four-page coding sheet (interrater-reliability = 0.93). Results indicated…
Descriptors: Music, Sexuality, Television, Clothing
Bean, Erik Paul – Online Submission, 2008
Since the 1920s, textbook critics have maintained that textbooks should offer a homogenous editorial approach, including an acknowledgment of a mix of author opinion and scholarly research. Several researchers indicated that some textbooks are not homogenous. The purpose of this quantitative content analysis study was to examine whether…
Descriptors: Textbooks, Online Courses, Content Analysis, Teaching Methods
Cameron Ponitz, C. E.; McClelland, M. M.; Jewkes, A. M.; Connor, C. M.; Farris, C. L.; Morrison, F. J. – Early Childhood Research Quarterly, 2008
Behavioral aspects of self-regulation, including controlling and directing actions, paying attention, and remembering instructions, are critical for successful functioning in preschool and elementary school. In recent years, several direct assessments of these skills have appeared, but few studies provide complete psychometric data and many are…
Descriptors: Performance Based Assessment, Construct Validity, Interrater Reliability, Preschool Children
Lee, Alice; Brown, Susanna; Gibbon, Fiona E. – International Journal of Language & Communication Disorders, 2008
Background: Many speech and language therapists work in a multilingual environment, making cross-linguistic studies of speech disorders clinically and theoretically important. Aims: To investigate the effect of listeners' linguistic background on their perceptual ratings of hypernasality and the reliability of the ratings. Methods &…
Descriptors: Sentences, Linguistics, Language Impairments, Speech Evaluation
Murray, Rosemary; Grande, Marya; DiCamillo, Lorrei; Henry, Julie; Henry, David – Action in Teacher Education, 2008
The standards of the National Council for Accreditation of Teacher Education require that teacher education programs document teacher candidates' effectiveness across different domains. Most challenging for our program is the directive to document the impact that candidates have on preK-12 student learning. To gather information on the effects…
Descriptors: Student Teaching, Teacher Education, Teacher Education Programs, Program Effectiveness

Direct link
