Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedLongford, Nicholas T. – Journal of Educational and Behavioral Statistics, 1996
Data from two standard-setting exercises were analyzed using the logistic regression model that assumes no variation in severity of raters, and results were compared with those obtained by logistic regression that allowed for severity variation. Results illustrate the importance of taking between-rater differences into account. (SLD)
Descriptors: Cutting Scores, Decision Making, Evaluators, Individual Differences
Peer reviewedConroy, Maureen A.; And Others – Education and Training in Mental Retardation and Developmental Disabilities, 1996
This study assessed the intra-rater and inter-rater reliability of the Motivation Assessment Scale as used with 20 adults with mental retardation, expanding the results of previous research by evaluating across additional time and administrations. Results from 19 raters indicated variable moderate-to-low intra-rater and inter-rater reliability.…
Descriptors: Adults, Behavior Problems, Interrater Reliability, Measures (Individuals)
Peer reviewedMoffitt, Terrie E.; And Others – Psychological Assessment, 1997
Whether partners provide congruent reports about abuse in their relationship was studied with 360 couples. Findings suggest that reports of abuse can be aggregated to form internally consistent scales that show strong interpartner agreement, and that either abuser or victim reports are suitable for research use. (SLD)
Descriptors: Battered Women, Emotional Abuse, Family Violence, Interrater Reliability
Peer reviewedHux, Karen; And Others – Journal of Communication Disorders, 1997
A study evaluated and compared four methods of assessing reliability on one discourse analysis procedure--a modified version of Damico's Clinical Discourse Analysis. The methods were Pearson product-moment correlations; interobserver agreement; Cohen's kappa; and generalizability coefficients. The strengths and weaknesses of the methods are…
Descriptors: Communication Disorders, Discourse Analysis, Evaluation Methods, Evaluation Problems
Peer reviewedSchoonen, Rob; And Others – Language Testing, 1997
Reports on three studies conducted in the Netherlands about the reading reliability of lay and expert readers in rating content and language usage of students' writing performances in three kinds of writing assignments. Findings reveal that expert readers are more reliable in rating usage, whereas both lay and expert readers are reliable raters of…
Descriptors: Foreign Countries, Interrater Reliability, Language Usage, Models
Peer reviewedHaladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests
Peer reviewedHill, Clara E.; And Others – Journal of Counseling Psychology, 1988
Outlined method for studying rater bias in counseling and psychotherapy. Used method to study three potential sources of rater bias concerning characteristics of rater, client, and therapist. Examined ratings on Collaborative Study Psychotherapy Rating Scale for 826 sessions of psychotherapy in Treatment of Depression Collaborative Research…
Descriptors: Bias, Client Characteristics (Human Services), Congruence (Psychology), Counselor Characteristics
Peer reviewedAkande, Adebowale – Early Child Development and Care, 1994
Tested 21 low-functioning children with mental retardation to determine validity of the Motivation Assessment Scale (MAS). Found that the interrater measures within the MAS were essentially uncorrelated and of independent dimensions and that the MAS is not suitable for use with African primary school children. (HTH)
Descriptors: Elementary School Students, Foreign Countries, Interrater Reliability, Mental Retardation
Peer reviewedOnslow, Mark; And Others – Journal of Speech and Hearing Research, 1992
Utterances from stuttering and normally speaking children, aged two through four years, were analyzed by clinicians specializing in stuttering, general clinicians, and university students (total n=25). Results indicated that the validity of the data language used by researchers to describe stuttered and normal speech in early childhood may be…
Descriptors: Child Language, Classification, Clinical Diagnosis, Evaluation
Adams, Joyce A.; Wells, Robert – Child Abuse and Neglect: The International Journal, 1993
Preselected colposcopic photographs of the anogenital area of 16 patients were shown to 170 medical examiners, who rated their level of suggestion or indication of penetrating injury. Agreement between the participants and experts was higher on the abnormal cases than on the normal cases, and higher on genital findings than on anal findings.…
Descriptors: Child Abuse, Interrater Reliability, Medical Evaluation, Pediatrics
Peer reviewedSigafoos, Jeff; And Others – Research in Developmental Disabilities, 1994
Eighteen adolescents and adults with severe/profound intellectual disability were rated by two staff members using the Motivation Assessment Scale to identify variables maintaining their aggressive behaviors. Analysis of interrater reliability indicated that for some individuals the scale may not represent a feasible alternative to more formal…
Descriptors: Adolescents, Adults, Aggression, Behavior Problems
Peer reviewedOlson, Julie B.; Hulin, Charles – Journal of Vocational Behavior, 1992
Ninety-five subjects observing videotaped secretaries performing well or poorly were told either to evaluate or monitor job performance. Ratings were influenced by the rater's objective. Regardless of objective, results showed a high level of covariance among the independent traits of the secretaries. (SK)
Descriptors: Cognitive Processes, Error of Measurement, Interrater Reliability, Job Performance
Peer reviewedPark, Hyun-Sook; And Others – Journal of Experimental Education, 1990
The reliability of visual inspection in single-case research was investigated by determining agreement among 5 judges visually inspecting 44 graphs depicting behavior from baseline to intervention. Agreement between visual inspection and statistical procedures was determined. Implications for single-case research are discussed. (SLD)
Descriptors: Behavior Patterns, Evaluation Methods, Evaluators, Graphs
Peer reviewedMuris, Peter; Steerneman, Pim; Meesters, Cor; Merckelbach, Harald; Horselenberg, Robert; van den Hogen, Tanja; van Dongen, Lieke – Journal of Autism and Developmental Disorders, 1999
Four studies investigated reliability and validity of the Theory of Mind (TOM) test, an instrument for assessing theory-of-mind ability in typical children and children with pervasive developmental disorders. The TOM test was found to be a reliable and valid instrument for measuring various aspects of theory of mind. (Author/CR)
Descriptors: Children, Interpersonal Competence, Interrater Reliability, Pervasive Developmental Disorders
Peer reviewedKarpati, Andrea; Zempleni, Andra; Verhelst, Norman V.; Veldhuijzen, Niels H.; Schonau, Diederik W. – Studies in Educational Evaluation, 1998
How a jury of art evaluators can increase the reliability of its judgments was studied with portfolios from 58 art students in Hungary evaluated by 15 art teacher jurors. Results cast doubt on the reliability of juror assessments. Merits of vertical and horizontal scoring approaches are discussed. (SLD)
Descriptors: Art Education, Art Products, Art Teachers, Foreign Countries


