Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedHutton, Jerry B.; And Others – Psychology in the Schools, 1987
Special education, basic, and honors ninth-grade students (n=60) rated the severity of stress for each of the life events on the Source of Stress Inventory (Chandler, 1981). There was a significant positive relationship between the Chandler rankings (teachers and mental health workers) and the student rankings. (Author/NB)
Descriptors: Grade 9, Interrater Reliability, Mental Health, Secondary Education
Peer reviewedNevo, Baruch – Journal of Educational Measurement, 1985
A literature review and a proposed means of measuring face validity, a test's appearance of being valid, are presented. Empirical evidence from examinees' perceptions of a college entrance examination support the reliability of measuring face validity. (GDC)
Descriptors: College Entrance Examinations, Evaluation Methods, Evaluators, Foreign Countries
Peer reviewedSilverman, William H.; And Others – Personnel Psychology, 1986
Examined how assessment center methods affect the way assessors organize and process assessment center information and affect the ratings they make. Results suggested that methods for evaluating assessment center candidates affected the way the assessors organized the assessment center information and affected the obtained ratings. (Author/ABB)
Descriptors: Assessment Centers (Personnel), Cognitive Processes, Evaluation Methods, Evaluators
Peer reviewedPeden, Blaine F. – Teaching of Psychology, 1986
Describes an exercise that teaches students about methodological issues concerned with making reliable observations of behavior. Students learned Eckman's (1972) Facial Affect Scoring Technique from a microcomputer program and then applied it in real life. This exercise generated much discussion about research methods, built observational skills,…
Descriptors: Behavior Rating Scales, Computer Assisted Instruction, Higher Education, Instructional Improvement
Peer reviewedTurner, Samuel M.; And Others – Journal of Consulting and Clinical Psychology, 1984
Examined the rating behavior of Black and White judges within the context of a social skills training program for patients (N=12) diagnosed as schizophrenic. Results indicated that Black and White judges may rate various social behaviors differently. (LLL)
Descriptors: Experimenter Characteristics, Interpersonal Competence, Interrater Reliability, Patients
Peer reviewedBarnes-Farrell, Janet L.; Weiss, Howard M. – Personnel Psychology, 1984
Tested the extremity of the scale values associated with standards chosen in the development of mixed standard scales with college students (N=248). Results indicated that standard extremity affects the level of ratings assigned; the number of logically inconsistent response patterns; and the relative position of respondents in performance…
Descriptors: College Students, Higher Education, Interrater Reliability, Job Performance
Peer reviewedDe Santi, Roger J.; Sullivan, Vicki Gallo – Journal of Research and Development in Education, 1985
Cloze-based evaluations of reading comprehension present room for a greater amount of subjectivity in rating reader response. A study was designed to ascertain the nature of potential subjectivity within a single-rater's ratings of cloze-based assessments of reading comprehension. (DF)
Descriptors: Cloze Procedure, Elementary Secondary Education, Error of Measurement, Interrater Reliability
Using Multiple Raters on Performance Based Driving Tests with High School Driver Education Students.
Haueisen, Heidi L. – 2001
An assessment tool was designed and implemented to increase consistent application among and between multiple raters assessing students in driver education. The targeted population was students in grades 9 through 12 enrolled in drive education at a high school in an affluent suburb near a large city. The problem of a lack of a consistent…
Descriptors: Driver Education, High School Students, High Schools, Interrater Reliability
McQueen, Joy; Congdon, Peter J. – 1997
A study was conducted to investigate the stability of rater severity over an extended rating period. Multifaceted Rasch analysis was applied to ratings of writing performances of 8,285 primary school (elementary) students. Each performance was rated on two performance dimensions by two trained raters over a period of 7 rating days. Performances…
Descriptors: Educational Assessment, Elementary Education, Elementary School Students, Foreign Countries
Manalo, Jonathan R.; Wolfe, Edward W. – 2000
Recently, the Test of English as a Foreign Language (TOEFL) changed by including a writing section that gives the examinee an option between computer and handwritten formats to compose their responses. Unfortunately, this may introduce several potential sources of error that might reduce the reliability and validity of the scores. The seriousness…
Descriptors: Computer Assisted Testing, Essay Tests, Evaluators, Handwriting
Shaw, Emily J.; Milewski, Glenn B. – College Entrance Examination Board, 2004
In order for individualized review in college admissions to be fair, issues of consistency and reliability must be considered. There are a number of ways to assess interrater reliability, including calculating the composite reliability of readers, computing the proportion of times that readers make consistent ratings, and evaluating reader…
Descriptors: College Applicants, College Admission, Interrater Reliability, Reliability
Peer reviewedGodbout, Paul; Schutz, Robert W. – Research Quarterly for Exercise and Sport, 1983
The appropriateness of several generalizability coefficients based on observational ratings of a number of subjects, each rated by the same observers over a number of motor performance trials, was examined. The extent to which observational designs can impair the computation of some coefficients and/or alter their value was also investigated.…
Descriptors: Evaluation Methods, Generalization, Interrater Reliability, Measurement Techniques
Peer reviewedMuris, Peter; Steerneman, Pim; Ratering, Elise – Journal of Autism and Developmental Disorders, 1997
A study of 10 children (ages 3-6) with pervasive developmental disorders investigated the interrater reliability of the Psychoeducational Profile (PEP). Results show good interrater reliability for the developmental items, indicating that the PEP can be used to evaluate progress in development of children with pervasive developmental disorders.…
Descriptors: Child Development, Children, Evaluation Methods, Foreign Countries
Berger, Peter N. – Teaching and Learning Literature with Children and Young Adults, 1997
Discusses problems with scoring reliability of the Vermont Education Department's writing portfolio test, particularly the difficulties teachers face in agreeing upon scoring criteria. (PA)
Descriptors: Elementary Secondary Education, Interrater Reliability, Portfolio Assessment, Portfolios (Background Materials)
Peer reviewedKavale, Kenneth A.; Forness, Steven R. – Journal of Learning Disabilities, 1996
This meta-analysis examined 152 studies on social skill deficits among students with learning disabilities (LD). Quantitative synthesis showed that about 75% of students with LD manifest social skill deficits that distinguish them from comparison samples. Approximately the same level of group differentiation was found across different raters…
Descriptors: Elementary Secondary Education, Incidence, Interpersonal Competence, Interrater Reliability


