Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedKvalseth, Tarald O. – Educational and Psychological Measurement, 1991
An asymmetric version of J. Cohen's kappa statistic is presented as an appropriate measure for the agreement between two observers classifying items into nominal categories, when one observer represents the "standard." A numerical example with three categories is provided. (SLD)
Descriptors: Classification, Equations (Mathematics), Interrater Reliability, Mathematical Models
Peer reviewedHubbard, Carol P. – Journal of Communication Disorders, 1998
This study examined interjudge agreement levels for five adult listeners assessing either overt stuttering or disfluency types in the spontaneous speech of eight young children. Results showed that the interjudge reliability for judgments based on a disfluency taxonomy was not significantly different from that based on stuttering. The importance…
Descriptors: Interrater Reliability, Phonology, Speech Evaluation, Speech Impairments
Peer reviewedFrederiksen, John R.; Sipusic, Mike; Sherin, Miriam; Wolfe, Edward W. – Educational Assessment, 1998
Developed a video portfolio technique of teacher assessment and evaluated the technique through studies of six teachers and their raters. Results show that teachers are consistent in observing teaching functions and using their observations to evaluate teaching. (SLD)
Descriptors: Evaluation Methods, Interrater Reliability, Portfolio Assessment, Teacher Evaluation
Peer reviewedBerr, Seth A.; Church, Allan H.; Waclawski, Janine – Human Resource Development Quarterly, 2000
Behavior measures and the Myers Briggs Type Indicator were completed by 343 senior managers; 3,158 of their peers, supervisees, and supervisors rated managers' behavior. A modest correlation appeared between personality type and manager behavior. Differences related to raters' perceptions were found. (SK)
Descriptors: Administrator Behavior, Feedback, Interprofessional Relationship, Interrater Reliability
Peer reviewedKlin, Ami; Lang, Jason; Cicchetti, Domenic V.; Volkmar, Fred R. – Journal of Autism and Developmental Disorders, 2000
This study examined the inter-rater reliability of clinician-assigned diagnosis of autism using or not using the criteria specified in the Diagnostic and Statistical Manual IV (DSM-IV). For experienced raters there was little difference in reliability in the two conditions. However, a clinically significant improvement in diagnostic reliability…
Descriptors: Autism, Clinical Diagnosis, Clinical Experience, Developmental Disabilities
Peer reviewedTate, Richard L. – Journal of Educational Measurement, 1999
Suggests that a modification of traditional linking is necessary when tests consist of constructed response items judged by raters and a possibility of year-to-year variation in rating discrimination and severity exists. Illustrates this situation with an artificial example. (SLD)
Descriptors: Equated Scores, Interrater Reliability, Item Response Theory, Multiple Choice Tests
Peer reviewedBrutus, Stephane; Fleenor, John W.; London, Manuel – Journal of Management Development, 1998
Self, subordinate, peer, and supervisor ratings of 1,080 managers in education, military, government, manufacturing, finance, and health were analyzed for leniency, interrater agreement, and effectiveness. In the private sector, more poor performing managers tended to overestimate their performance. Interrater agreement was lowest in government…
Descriptors: Comparative Analysis, Feedback, Interrater Reliability, Job Performance
Peer reviewedVan Noord, Robert G.; Prevatt, Frances F. – Journal of School Psychology, 2002
Evaluates the effects of rater reliability of common IQ and achievement tests on subsequent learning disorder eligibility determinations, particularly with respect to difficulty level of individual subtests and expertise of the scorer. The study corroborates previous findings of strong interrater reliability on most subtests of common IQ and…
Descriptors: Achievement Tests, Disability Identification, Intelligence Tests, Interrater Reliability
Peer reviewedCordes, Anne K. – Journal of Speech, Language, and Hearing Research, 2000
In this study, 30 judges identified disfluency types they perceived in audiovisually recorded speech stimuli, first individually and then with a partner. Although intrapair and interpair agreement was higher in the partner than the individual condition, agreement for occurrences still averaged below 50 percent. Findings suggest caution in use of…
Descriptors: Adults, Evaluation Methods, Interrater Reliability, Speech Acts
Falk, Ruma; Lann, Avital – Teaching Statistics: An International Journal for Teachers, 2006
A coefficient of unfairness in the allocation of goods to people can be extended to measuring consensus among judges. The notion of relative variability underlies the formation of these measures.
Descriptors: Judges, Measures (Individuals), Interrater Reliability, Measurement Techniques
Moon, Tonya R.; Brighton, Catherine M.; Callahan, Carolyn M.; Robinson, Ann – Journal of Secondary Gifted Education, 2005
This article discusses the rationale for, and explicates the process used in, developing differentiated authentic assessments for middle school classrooms (many of which contain gifted students) that are aligned with state academic standards. The assessments were developed based on learner-centered psychological principles and revised based on a…
Descriptors: Instruction, Classrooms, Academic Standards, Interrater Reliability
A Measure of Agreement for Interval or Nominal Multivariate Observations by Different Sets of Judges
Janson, Harald; Olsson, Ulf – Educational and Psychological Measurement, 2004
This article addresses the problem of accounting overall multivariate chance-corrected interobserver agreement when targets have been rated by different sets of judges (not necessarily equal in number). The proposed approach builds on Janson and Olsson's multivariate generalization of Cohen's kappa but incorporates weighting for number of judges…
Descriptors: Interrater Reliability, Multivariate Analysis, Evaluation Methods, Measurement Techniques
Taylor, Steven; McKay, Dean; Abramowitz, Jonathan S. – Psychological Review, 2005
This paper comments on the response offered by Szechtman and Woody to Taylor et al's initial comments on Szechtman and Woody's original article. Taylor et al highlight one problem with their model that Woody and Szechtman seem to think is unimportant: the treatment relevance of their model. The analogy of aspirin and colds was used, suggesting…
Descriptors: Motivation, Item Analysis, Reader Response, Criticism
Griffith, Annette K.; Trout, Alexandra L.; Hagaman, Jessica L.; Harper, John – Behavioral Disorders, 2008
This review examines interventions intended to improve the literacy functioning of adolescent students with emotional and/or behavior disorders. Seventeen studies met inclusion criteria and included a variety of interventions designed to affect a variety of literacy areas, including spelling, writing, and reading fluency. Findings from these…
Descriptors: Intervention, Reading Fluency, Behavior Disorders, Emotional Disturbances
Yick, Alice G.; Oomen-Early, Jody – Journal of Interpersonal Violence, 2008
Until recently, research studies have implied that domestic violence does not affect Asian American and immigrant communities, or even Asians abroad, because ethnicity or culture has not been addressed. In this content analysis, the authors examined trends in publications in leading scholarly journals on violence relating to Asian women and…
Descriptors: Family Violence, Asian Culture, Interrater Reliability, Family Structure

Direct link
