Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Watkins, Marley W.; Canivez, Gary L. – Diagnostique, 1997
A study of 71 students (ages 7-17) with disabilities investigated the interrater agreement of the Adjustment Scales for Children and Adolescents (ASCA), a behavior rating scale used in school settings. Participants were rated by 29 educational professionals in 24 classrooms. Results found ASCA produced acceptable levels of interrater agreement.…
Descriptors: Behavior Rating Scales, Disabilities, Elementary Secondary Education, Evaluation Methods
Peer reviewedEngelhard, George. Jr. – Journal of Outcome Measurement, 1997
Presents procedures based on Rasch measurement theory for construction of an assessment network, which is defined as a system of rate and task banks. Three general classes of data collection designs are presented to calibrate an assessment network and provide the opportunity for objective and fair measurements. (SLD)
Descriptors: Data Collection, Educational Assessment, Interrater Reliability, Item Banks
Peer reviewedRunco, Mark A. – Child Study Journal, 1989
Examines artwork of elementary school students to determine the interrater and inter-item reliabilities of ratings given by professional artists. The generality of creative performance for the artworks, and the topic of age trends in artistic creativity, are also considered. (BB)
Descriptors: Artists, Childrens Art, Creativity, Creativity Tests
Peer reviewedRoy, C. W.; And Others – International Journal of Rehabilitation Research, 1988
Twenty rehabilitation patients were assessed on their activities of daily living using the Barthel Index, and were also observed by two occupational therapists in a simulated home unit. Results indicated good inter-observer reliability, and good agreement between asking the patient and professional observation of the patient. (JDD)
Descriptors: Adults, Daily Living Skills, Disabilities, Evaluation Methods
Peer reviewedvan den Bergh, Huub; Eiting, Mindert H. – Journal of Educational Measurement, 1989
A method of assessing rater reliability via a design of overlapping rater teams is presented. Covariances or correlations of ratings can be analyzed with LISREL models. Models in which the rater reliabilities are congeneric, tau-equivalent, or parallel can be tested. Two examples based on essay ratings are presented. (TJH)
Descriptors: Analysis of Covariance, Computer Simulation, Correlation, Elementary Secondary Education
Peer reviewedPowers, Donald E.; And Others – Journal of Educational Measurement, 1994
The effects on essay scores of intermingling handwritten and word-processed student essays were studied with 32 students who produced handwritten and word-processed essays. Essays were converted to the other format and rescored. Results reveal higher average scores for handwritten essays. Implications for scoring are considered. (SLD)
Descriptors: College Students, Computer Uses in Education, Essays, Handwriting
Peer reviewedFigueredo, Aurelio Jose; And Others – Multivariate Behavioral Research, 1995
Two longitudinal studies involving 29 raters concerning the construct validity, temporal stability, and interrater reliability of the latent common factors underlying subjective assessments by human raters of personality traits in the stumptail macaque and the zebra finch illustrate the use of generalizability analysis to test prespecified…
Descriptors: Animal Behavior, Construct Validity, Evaluation Methods, Generalizability Theory
Peer reviewedZegers, Frits E. – Applied Psychological Measurement, 1991
The degree of agreement between two raters rating several objects for a single characteristic can be expressed through an association coefficient, such as the Pearson product-moment correlation. How to select an appropriate association coefficient, and the desirable properties and uses of a class of such coefficients--the Euclidean…
Descriptors: Classification, Correlation, Data Interpretation, Equations (Mathematics)
Foley, Regina M.; Epstein, Michael H. – Diagnostique, 1991
Sixty-five pairs of teachers and parents of elementary and secondary school learning-disabled students completed the Homework Problem Checklist (HPC). The HPC demonstrated a moderate level of interrater reliability. Acceptable levels of internal consistency were reported for both teacher and parent ratings. (JDD)
Descriptors: Check Lists, Elementary Secondary Education, Homework, Interrater Reliability
Peer reviewedCook, William L.; Goldstein, Michael J. – Child Development, 1993
Tested the assumption that familial self-reports are biased by social desirability and other factors, through the use of a latent variables modeling approach that evaluated rater reliability and bias in mother, father, and child ratings of parent-child negativity. Results based on 78 families demonstrated that family member ratings contained a…
Descriptors: Children, Family Relationship, Interrater Reliability, Parent Child Relationship
Peer reviewedPage, Ellis Batten – Journal of Experimental Education, 1994
National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)
Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods
Peer reviewedFry, Stuart A. – Assessment and Evaluation in Higher Education, 1990
A peer evaluation experiment in a British polytechnic institute found peer grading correlated positively with teacher grading. A survey of participants (n=70) found that five advantages of peer marking had been achieved and that students believed their work had been marked fairly and the marks should count toward final grades. (Author/MSE)
Descriptors: College Students, Evaluation Methods, Foreign Countries, Grading
Peer reviewedMarcoulides, George; Simkin, Mark G. – Journal of Education for Business, 1991
A preprinted evaluation form and generalizability theory were used to judge the reliability of student grading of their peers' papers. Findings suggest that students can be consistent and fair in their assessments. Student practice in peer evaluation will help develop the management skill of employee evaluation. (JOW)
Descriptors: Business Administration Education, Generalizability Theory, Grading, Higher Education
Peer reviewedRieber, Lloyd – College Teaching, 1993
The use of paraprofessional editors to evaluate student writing, particularly in large college classes, allows teachers to give students more writing practice, provides more individual assistance for students, and helps teachers gain insight into student needs. Adoption of uniform criteria for evaluation also provides consistency and objectivity.…
Descriptors: Classroom Techniques, College Instruction, Editors, Evaluation Criteria
Peer reviewedLittlefield, John H.; And Others – Academic Medicine, 1991
Interrater reliability in numerical ratings of clerkship performance (n=1,482 students) in five surgery programs was studied. Raters were classified as accurate or moderately or significantly stringent or lenient. Results indicate that increasing the proportion of accurate raters would substantially improve the precision of class rankings. (MSE)
Descriptors: Academic Achievement, Clinical Experience, Evaluation Criteria, Higher Education


