Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedHughes, Garry L.; Prien, Erich P. – Personnel Psychology, 1989
The relationship between job skills and specific tasks is difficult to analyze, since any one task may require multiple skills. In this study, a sequence of statistical evaluations was conducted to examine the reliability of a subject matter expert panel's association of tasks and job skills and the factor structure of the task by job skill…
Descriptors: Evaluation Methods, Factor Analysis, Factor Structure, Interrater Reliability
Peer reviewedLewis, Kerry E. – American Journal of Speech-Language Pathology, 1995
An examination of the extent to which scores on the Stuttering Severity Instrument (SSI) for Children and Adults, Third Edition, accurately reflect 10 judges' observations of stuttering behaviors found that SSI scores obscured the wide range of judges' raw counts and did not accurately reflect the observational data from which they were derived.…
Descriptors: Adults, Children, Evaluation Methods, Interrater Reliability
Peer reviewedAu, M. L.; Pumfrey, P. D. – British Journal of Special Education, 1993
This study, involving 10 children (ages 7-11) attending an English school for children with moderate learning difficulties and their parents, revealed that parents' and teachers' estimates of attainments did not deviate significantly from the scores actually obtained by the children. There was considerable variation between teachers and parents…
Descriptors: Academic Achievement, Elementary Education, Expectation, Foreign Countries
Peer reviewedHambleton, Ronald K.; Plake, Barbara S. – Applied Measurement in Education, 1995
Several extensions to the Angoff method of standard setting are described that can accommodate characteristics of performance-based assessment. A study involving 12 panelists supported the effectiveness of the new approach but suggested that panelists preferred an approach that was at least partially conjunctive. (SLD)
Descriptors: Educational Assessment, Evaluation Methods, Evaluators, Interrater Reliability
Peer reviewedGunn, Pat; Cuskelly, Monica – International Journal of Disability, Development and Education, 1991
Behavioral ratings by mothers and teachers of 94 children with Down's Syndrome (between 8 and 14 years of age) indicated general support for the amiable personality stereotype, but ratings of low persistence were associated with maternal impressions of difficulty. There was little agreement between mothers and teachers regarding individual child…
Descriptors: Adolescents, Behavior Problems, Children, Downs Syndrome
Peer reviewedSimpson, Robert G. – Behavioral Disorders, 1991
The behavior of each of 120 students in grades 9-12 was rated by 2 of the student's teachers using the Revised Behavior Problem Checklist. Results indicated a generally low to moderate degree of relationship among teacher ratings. It is recommended that clinicians collect behavioral ratings from many raters before reaching diagnostic conclusions.…
Descriptors: Behavior Problems, Check Lists, Clinical Diagnosis, Interrater Reliability
Realmuto, George M.; Wescoe, Sibyl – Child Abuse and Neglect: The International Journal, 1992
This study investigated whether 13 young children presented with anatomically correct dolls would exhibit behaviors that professionals (n=14) could agree on to determine the child's abuse status. The study concluded that experienced professionals agree poorly with each other about a child's abuse status and that sexually anatomically correct dolls…
Descriptors: Behavior Patterns, Child Abuse, Clinical Diagnosis, Evaluation Methods
Peer reviewedLehmann, Rainer H. – Studies in Educational Evaluation, 1990
Using results for 1,487 West German eleventh graders from the International Study of Achievement in Written Composition, problems related to interrater and intrarater reliability and generalizability of composition ratings were assessed. Results indicate that numerous writing assignments are required to assure reliability. (TJH)
Descriptors: Evaluators, Foreign Countries, Generalizability Theory, Grade 11
Peer reviewedJanes, Joseph W.; McKinney, Renee – Library Quarterly, 1992
This study examined judgments of document relevance made by library science graduate students who were not the originators of the queries for which the documents were retrieved. Although the secondary judgments compared well with those of the original users, it was found that secondary judges used document record fields differently and had a…
Descriptors: Comparative Analysis, Higher Education, Interrater Reliability, Online Searching
Peer reviewedMcWilliam, R. A.; Ware, William B. – Journal of Early Intervention, 1994
Forty-seven young children, 15 with disabilities, were observed 4 times for types and levels of engagement. Results indicated that engagement is difficult to measure through molecular data collection techniques because of error in dependability measures. The number of observed sessions could be increased to achieve generalizability, but increases…
Descriptors: Attention, Classroom Observation Techniques, Data Collection, Disabilities
Peer reviewedWapnick, Joel; And Others – Journal of Research in Music Education, 1993
Reports on a study of the use of musical scores and rating scales by 80 pianists who listened to 21 trials of solo piano music. Found that the use of musical scores and rating scales did not improve interrater reliability. Discovered that the subjects were less consistent when evaluating slow musical pieces than faster pieces. (CFR)
Descriptors: Evaluation Methods, Evaluation Research, Evaluation Utilization, Higher Education
Peer reviewedSzatmari, P.; And Others – Journal of Autism and Developmental Disorders, 1994
Parents and teachers of 83 young children diagnosed with a pervasive developmental disorder rated each child on the Autism Behavior Checklist (ABC) and the Vineland Adaptive Behavior Scales (VABS). Although there was good agreement between informants on the VABS, there was virtually no agreement on the ABC, with parents reporting more autistic and…
Descriptors: Behavior Rating Scales, Check Lists, Developmental Disabilities, Interrater Reliability
Peer reviewedFigueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1992
A quantitative ethogram was developed for the zebra finch, using one-zero focal animal sampling on an ethologically comprehensive checklist of 52 behavioral items, and it was assessed for interobserver reliability and construct validity. Applying the quantitative methods of psychometrics allows verification of ethological theory and testing of…
Descriptors: Animal Behavior, Check Lists, Comparative Analysis, Construct Validity
Peer reviewedGoodwin, Laura D.; Goodwin, William L. – Journal of Early Intervention, 1991
Four approaches to estimating interrater reliability in early childhood special education research are illustrated and compared: correlation, comparison of means, percentage of agreement, and generalizability theory techniques. Generalizability theory techniques are proposed as a method for estimating the amount of variance attributable to…
Descriptors: Analysis of Variance, Disabilities, Early Childhood Education, Educational Research
Peer reviewedHuot, Brian – College Composition and Communication, 1990
Describes holistic scoring as one of the biggest breakthroughs in writing assessment. Suggests that the technique's high interrater reliability coefficients partly explain holistic scoring's popularity. Argues that validity has been largely neglected. Concludes that more must be learned about the uses and effects of holistic scoring. (SG)
Descriptors: Educational Testing, Higher Education, Holistic Approach, Holistic Evaluation


