Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedHalpin, Gerald; And Others – Educational and Psychological Measurement, 1983
Although arbitrary, whenever multiple judgmental standard-setting procedures are utilized by different groups concurrently, stability across raters can be achieved and decisions can be made in a relatively judicious manner. Greater stability across methods (Ebel, Nedelsky, Angoff) may be effected by slightly modifying the Ebel approach. (Author/PN)
Descriptors: Admission Criteria, College Entrance Examinations, Cutting Scores, Higher Education
Peer reviewedJohnson, David W.; And Others – Review of Educational Research, 1983
A theoretical model is presented with a review of supportive literature to establish the conditions under which desegregation and mainstreaming will result in constructive or destructive outcomes. Meta-analysis procedures examine all the available research relevant to the model, and point toward practical intergroup procedures based on the…
Descriptors: Desegregation Effects, Disabilities, Elementary Secondary Education, Ethnic Relations
Peer reviewedOrsmond, Paul; Merry, Stephen; Reiling, Kevin – Assessment & Evaluation in Higher Education, 1997
Reports on a study of a student self-assessment method in college biology, comparing students' self-evaluation, students' peer evaluation, and the teacher's evaluation criteria. Results illustrate potential problems in making assumptions about student ability to self-evaluate but also support previous findings about the instructional usefulness of…
Descriptors: Biology, College Faculty, College Instruction, College Students
Peer reviewedOrsmond, Paul; And Others – Assessment & Evaluation in Higher Education, 1996
A study comparing peer and teacher evaluations of British university biology students' (n=39) performance found such comparison misleading as a guide to the validity of peer assessment. When individual criteria were analyzed, agreement of peers and teacher ranged from 31-62%, with specific areas of the criteria prone to over- and undervaluation.…
Descriptors: Bias, Biology, College Students, Comparative Analysis
Peer reviewedGrant, Leslie – Language Testing, 1997
Describes current procedures used for testing bilingual teachers in the United States and focuses on one means of assessment used in Arizona. Examinee questionnaire responses, teacher questionnaire responses and test section analysis all contributed evidence for validity. (33 references) (Author/CK)
Descriptors: Bilingualism, Criterion Referenced Tests, Interrater Reliability, Language Teachers
Peer reviewedTomada, Giovanna; Schneider, Barry H. – Developmental Psychology, 1997
Replicated and extended American research on overt and relational aggression with Italian children. Found that peer and teacher nominations for aggression and prosocial behavior were highly stable, although with very poor concordance between them. Peer nominations for overt and relational aggression were linked to peer rejection. Boys' scores were…
Descriptors: Aggression, Bullying, Child Behavior, Children
Peer reviewedPugh, Malcolm; Lock, Roger – Research in Science and Technological Education, 1989
The development of a framework for analyzing pupil talk is described and the reliability of scoring transcribed conversions using the framework discussed. Definitions and examples of the terms used in the framework are appended. (Author/YP)
Descriptors: Biology, Foreign Countries, Group Discussion, Interrater Reliability
Peer reviewedReid, William J.; And Others – Journal of Social Work Education, 1996
In a study with 13 social work and counseling interns, field supervisors' ratings of students' field performance were compared to an independent judge's content analysis of performance. Results revealed significant correlations between the evaluations, providing evidence of validity of the supervisors' assessments. Validity may have been enhanced…
Descriptors: Evaluation Methods, Field Experience Programs, Higher Education, Interrater Reliability
Peer reviewedLevine, Phyllis; Edgar, Eugene – Exceptional Children, 1994
High school graduates in regular (n=280) and special education (n=223) and their parents were interviewed. Parent-student agreement percentages were high for the variables of attending postsecondary school, employment status, type of residence, marital status, and number of children. Low agreement rates were obtained for salary level, hours…
Descriptors: Disabilities, Employment, Followup Studies, Graduate Surveys
Peer reviewedSigafoos, Jeff; Pennell, Donna – Education and Training in Mental Retardation and Developmental Disabilities, 1995
Comparison using paired t-tests of parent and teacher ratings for 16 preschool children on the Receptive-Expressive Emergent Language Scale found no significant differences between parent and teacher ratings of expressive language, but a significant difference on the receptive language subscale. However, interrater reliability was relatively low…
Descriptors: Developmental Disabilities, Expressive Language, Interrater Reliability, Language Skills
Peer reviewedThompson, Irene – Foreign Language Annals, 1995
Considers the interrater reliability of certified testers in five European languages, the relationship between interviewer-assigned ratings and second ratings based on audio replay, interrater reliability as a function of proficiency level, effect of different languages on interrater agreement, and interrater disagreements with regard to…
Descriptors: Audiotape Recordings, English (Second Language), Evaluators, French
Peer reviewedReid, Robert; Maag, John W. – Journal of School Psychology, 1994
Article describes behavior rating scales and the difficulties in the use of cutoff scores to identify students as Attention-Deficit Hyperactivity Disorder. Also described are how problems with interobserver agreement hamper the validity of rating scales and the subsequent conclusions that can be drawn about students' behavior. (RJM)
Descriptors: Attention Deficit Disorders, Attention Span, Behavior Rating Scales, Children
Peer reviewedSmith, Richard Merrill – Academic Medicine, 1993
A University of Hawaii study compared objective and subjective assessments of the three-step triple jump examination which tests medical students' clinical problem-solving processes. Subjects were 58 first-year students. Results found the subjective assessments were more consistent across problems of varying difficulty level than were objective…
Descriptors: Case Studies, Difficulty Level, Higher Education, Interrater Reliability
Peer reviewedKinch, Carol; Lewis-Palmer, Teri; Hagan-Burke, Shanna; Sugai, George – Education and Treatment of Children, 2001
A study examined the usefulness of information secured from eight students displaying substantially more problem behaviors in one classroom (high-risk) than another, and 16 teachers. Students were able to provide reliable information in the functional assessment interview. Moderate to high agreement was obtained between students and teachers in…
Descriptors: Antisocial Behavior, Behavior Problems, Data Collection, Functional Behavioral Assessment
Feldt, Leonard S.; Kim, Seonghoon – Educational and Psychological Measurement, 2006
Researchers sometimes need a statistical test of the hypothesis that two values of Cronbach's alpha reliability coefficient are equal. The situation may involve scores from two different measures administered to independent random samples or from the same measure administered to random samples from two different populations. Feldt derived a test…
Descriptors: Individual Testing, Test Items, Sample Size, Scores

Direct link
