Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedRoss, Donald C. – Educational and Psychological Measurement, 1992
Large sample chi-square tests of the significance of the difference between two correlated kappas, weighted or unweighted, are derived. Cases are presented with one judge in common between the two kappas and no judge in common. An illustrative calculation is included. (Author/SLD)
Descriptors: Chi Square, Correlation, Equations (Mathematics), Evaluators
Peer reviewedAlliger, George M.; Williams, Kevin J. – Educational and Psychological Measurement, 1992
The internal consistency of a scale and various indices of rating scale response styles (such as halo, leniency, and positive or negative response bias) are related to mean scale item intercorrelation. The consequent relationship between internal consistency and rating scale response styles is discussed. (Author/SLD)
Descriptors: Correlation, Evaluators, Interrater Reliability, Rating Scales
Peer reviewedCordes, Anne K.; And Others – Journal of Speech and Hearing Research, 1992
Three groups of judges (n=18) differing in stuttering judgment experience identified stuttering events in repeated speech samples, to investigate a measurement methodology based on time-interval analyses. Results showed interjudge agreement was affected by the particular speech sample, the judges' previous experience, and the length of the…
Descriptors: Evaluation Methods, Experience, Interrater Reliability, Measurement Techniques
Peer reviewedKenny, David A. – Psychological Review, 1991
Consensus refers to the extent of two judges' agreement in rating a common target. A general model of interpersonal perception based on the weighted average model of N. H. Anderson (1981) is developed to show that increased acquaintance does not always lead to changes in consensus. (SLD)
Descriptors: Interpersonal Relationship, Interrater Reliability, Judges, Models
Ottenbacher, Kenneth J. – American Journal on Mental Retardation, 1993
Because of the importance of single-subject studies in mental retardation research, a meta-analysis of the literature examining interrater agreement for visual analysis of single-subject data was conducted. Analysis revealed an overall itinerant agreement value of 0.58 for 14 studies representing 789 raters. Effects of data type, design type,…
Descriptors: Data Analysis, Interrater Reliability, Mental Retardation, Meta Analysis
Peer reviewedIngham, Roger J.; And Others – Journal of Speech and Hearing Research, 1993
This replication study of time-interval judgments of stuttering found higher interjudge agreement than previously reported for event-based analyses of stuttering judgments or time-interval analyses of event judgments. Judges with high intrajudge agreement levels also showed higher interjudge agreement levels than did judges with low intrajudge…
Descriptors: Evaluation Methods, Interrater Reliability, Measurement Techniques, Research Methodology
Peer reviewedDyer, Jack L.; And Others – Journal of Education for Business, 1994
Holistic scoring enables the evaluation of writing skills based on general impressions of content and style. An experiment in an accounting class shows how it can be applied successfully with a high degree of reliability. (SK)
Descriptors: Accounting, Higher Education, Holistic Evaluation, Interrater Reliability
Peer reviewedPolatajko, Helene; And Others – Canadian Journal of Occupational Therapy, 1993
Two occupational therapists rated 13 students after 1-week placements, using the Performance Evaluation of Occupational Therapy Students (PEOTS). The instrument had good interrater reliability but test-retest reliability was difficult to evaluate. Preliminary findings support the use of PEOTS as an evaluation tool. (JOW)
Descriptors: Clinical Experience, Interrater Reliability, Occupational Therapy, Student Evaluation
The Assessment Center: An Examination of the Effects of Assessor Characteristics on Assessor Scores.
Peer reviewedLowry, Phillip E. – Public Personnel Management, 1993
Age and rank of assessor were the only characteristics significantly affecting scores given in three police and six fire service assessment centers, but the magnitude of the effect was quite small. Results were attributed to assessor selection processes, the way centers were conducted, and type of assessor training. (SK)
Descriptors: Assessment Centers (Personnel), Evaluators, Fire Fighters, Interrater Reliability
Peer reviewedAllen, Jeff M.; Schumacker, Randall E. – Journal of Outcome Measurement, 1998
Individual members (n=308) rated their teams (31 teams) across 12 different criteria of team performance. Using the many-facet Rasch model, the multiple criteria differences in team ratings can be utilized to better assess the meaning of team performance. (Author/MAK)
Descriptors: Evaluation Methods, Group Dynamics, Institutional Research, Interrater Reliability
Peer reviewedTeti, Douglas M.; McGourty, Sharon – Child Development, 1996
Examined associations between mothers' and trained observers' Attachment Q-Set (AQS) sorts for preschoolers and assessed mother-observer concordance in relation to observers' confidence about how representative the behavior they witnessed was of the domain of AQS items. Found that mothers' and observers' sorts were significantly intercorrelated;…
Descriptors: Attachment Behavior, Child Behavior, Correlation, Experimenter Characteristics
Peer reviewedMcCarthy, Alma M.; Garavan, Thomas N. – Journal of European Industrial Training, 2001
Explores the nature of 360 degree feedback, investigates factors that have influenced its emergence, and contrasts it with more traditional performance management processes. Identifies benefits and problems, considers issues surrounding sources of feedback, and discusses issues related to the use of multirater feedback. (Contains 99 references and…
Descriptors: Career Development, Evaluators, Interrater Reliability, Peer Evaluation
Peer reviewedBaume, David; Yorke, Mantz – Studies in Higher Education, 2002
Analyzed the assessments of 53 portfolios used to evaluate participants in a development course for higher education teachers at the United Kingdom's Open University. Findings included a high reliability in assessment at the level of course outcomes, and that cumulation of component assessments is very likely to reduce the reliability of overall…
Descriptors: Foreign Countries, Higher Education, Interrater Reliability, Portfolio Assessment
Peer reviewedGaudet, Laura; Pulos, Steve; Crethar, Hugh; Burger, Susan – Education and Training in Mental Retardation and Developmental Disabilities, 2002
In this study, self-reports of 34 individuals with developmental disabilities (DD) were compared with proxy ratings from family and providers. Correlations between the ratings of individuals with DD and the proxy raters were low, as were the correlations between family members and providers. In all scales except "cognition," the individual with DD…
Descriptors: Adults, Developmental Disabilities, Evaluation Methods, Interrater Reliability
Gawronski, Bertram; Bodenhausen, Galen V. – Psychological Bulletin, 2006
Replies to commentaries by D. Albarracin, W. Hart, and K. C. McCulloch (see record 2006-10465-004), A. W. Kruglanski and M. Dechesne (see record 2006-10465-005), and R. E. Petty and P. Brinol (see record 2006-10465-006) on B. Gawronski and G. V. Bodenhausen's (2006; see record 2006-10465-003) recently proposed associative-propositional evaluation…
Descriptors: Criticism, Evaluation, Student Attitudes, Interrater Reliability

Direct link
