Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Taherbhai, Husein; Young, Michael James – 2000
This empirical study used data from the Reading: Basic Understanding section of the New Standards English Language Arts Examination. Data were collected for 3,200 high school students randomly selected from those who took the examination. The resulting sample had 16 raters who scored 200 students each, with each student rated by only 1 rater. The…
Descriptors: Evaluators, High School Students, High Schools, Interrater Reliability
Wang, Ning; Wiser, Randall F.; Newman, Larry S. – 2001
This paper provides both logical and empirical evidence to justify the use of an item mapping method for establishing passing scores for multiple-choice licensure and certification examinations. After describing the item-mapping standard setting process, the paper discusses the theoretical basis and rationale for this newly developed method and…
Descriptors: Certification, Cutting Scores, Interrater Reliability, Item Response Theory
Peer reviewedFleishman, Rachel; And Others – Evaluation Review, 1996
An interjudge reliability test was conducted to evaluate questionnaires used in the surveillance of residential care institutions in Israel. Results from 32 institutions (evaluated by two surveyor teams--one social worker and 1 nurse per team) and the variance in reliability were used to improve the questionnaires and their administration. (SLD)
Descriptors: Evaluators, Foreign Countries, Institutional Characteristics, Interrater Reliability
Peer reviewedLombard, Matthew; Snyder-Duch, Jennifer; Bracken, Cheryl Campanella – Human Communication Research, 2002
Reviews the importance of intercoder agreement for content analysis in mass communication research. Describes several indices for calculating this type of reliability (varying in appropriateness, complexity, and apparent prevalence of use). Presents a content analysis of content analyses reported in communication journals to establish how…
Descriptors: Communication Research, Content Analysis, Higher Education, Interrater Reliability
Peer reviewedKolevzon, Michael S.; And Others – Journal of Marital and Family Therapy, 1988
Employed triangulation strategy for assessing family interaction, involving family members, therapist, and coders independently viewing videotapes. Found weak agreement between paired assessments within family triad, and within therapist-coder dyad. Findings suggest that methodological and/or scaling strategies designed to maximize agreement may…
Descriptors: Counselor Attitudes, Evaluation Criteria, Evaluation Methods, Evaluation Problems
Peer reviewedSagi, Abraham; And Others – Developmental Psychology, 1994
Interviewed Israeli students to assess the Adult Attachment Interview's test-retest reliability and effects of the interviewers on the interview itself. Information about subjects' memory and intellectual abilities was obtained from external sources. Found a high degree of interrater and test-retest reliabilities, irrespective of interviewers.…
Descriptors: Foreign Countries, Intelligence, Interrater Reliability, Memory
Peer reviewedColliver, Jerry R.; And Others – Journal of Academic Medicine, 1991
Case means and case failures in performance-based medical student evaluations were examined to evaluate the consistency of ratings made by two or more standardized patients (SPs) simulating the same case. Results demonstrate a need for caution in interpreting scores obtained from a case checklist completed by multiple SPs. (Author/MSE)
Descriptors: Evaluation Methods, Higher Education, Interrater Reliability, Medical Education
Peer reviewedKorner, Anneliese F.; And Others – Child Development, 1991
The Neurobehavioral Assessment of the Preterm Infant instrument was developed by means of pilot, exploratory, and validation studies. The validation study tested the generalizability of results for different cohorts, test versions, hospitals, and examiners. Seven stable functions were identified: motor development; scarf sign; popliteal angle;…
Descriptors: Behavior Development, Cluster Analysis, Cohort Analysis, Interrater Reliability
Peer reviewedBradley, Clare – Assessment and Evaluation in Higher Education, 1993
Analysis of a study of sex bias in undergraduate student project evaluations revealed evidence of bias that was overlooked by the researchers. Research methodology and interpretation are discussed further. (MSE)
Descriptors: College Students, Higher Education, Interrater Reliability, Research Methodology
Peer reviewedYeaton, William H.; Wortman, Paul M. – Evaluation Review, 1993
Current practices of reporting a single mean intercoder agreement in meta-analysis leads to systematic bias and overestimates reliability. An alternative is recommended in which average intercoder agreement statistics are calculated within clusters of coded variables. Two studies of intercoder agreement illustrate the model. (SLD)
Descriptors: Coding, Decision Making, Estimation (Mathematics), Interrater Reliability
Peer reviewedCousins, J. Bradley; And Others – Alberta Journal of Educational Research, 1993
Two experiments studied teachers' proficiency in assessing students' higher order thinking skills. After training alone or after training plus implementation of an instructional unit on correlational thinking, teacher ratings of student samples did not correspond highly with an expert's assessment although they showed sensitivity to student age…
Descriptors: Elementary Secondary Education, Evaluation Problems, Evaluators, Interrater Reliability
Arthur, Michael – American Journal on Mental Retardation, 2000
In this response to critiques (Mudford, Hogg and Roberts 1997, 1999) of the use of behavior states in research involving individuals with mental retardation, it is argued that the work on behavioral state analysis by Robert D. Guess has contributed to the field at the practical, empirical, and theoretical levels. (Contains references.) (CR)
Descriptors: Adults, Behavior Patterns, Children, Evaluation Methods
Peer reviewedWeigle, Sara Cushing – Assessing Writing, 1999
Investigates how experienced and inexperienced raters score essays written by English as a Second Language (ESL) students on two different prompts. Shows that the inexperienced raters were more severe than the experienced raters on one prompt but not on the other prompt, and that differences between the two groups of raters were eliminated…
Descriptors: Elementary Secondary Education, English (Second Language), Evaluation Research, Evaluators
Peer reviewedLongford, N. T. – Journal of Educational and Behavioral Statistics, 1994
Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)
Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability
Lecavalier, L.; Havercamp, S. M. – Journal of Intellectual Disability Research, 2004
Sensitivity theory proposes that there are wide individual differences in what motivates people with intellectual disability. The Reiss Profile MRDD is a rating scale that measures 15 fundamental motives. This study examined the internal consistency and interrater reliability of the 15 subscales as well as the validity of motivational profiles.…
Descriptors: Profiles, Caregivers, Validity, Rating Scales

Direct link
