Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Nicolai, Michael T. – 1987
To determine if there is a distinction between the forensics community's idea of quality and that of the general population, tournament rankings of forensics judges and those of a lay audience were compared. Undergraduate students enrolled in a variety of speech related courses were asked to attend rounds of competition at a midwest collegiate…
Descriptors: Communication Research, Comparative Analysis, Debate, Evaluation Criteria
Taylor, Marcia B; Porterfield, William D. – 1984
This paper describes the Measure of Epistemological Reflection (MER), an instrument to assess cognitive developmental level according to the Perry scheme of intellectual and ethical development. It contains sets of questions for each of the six cognitive domains: decision making, learner role, instructor role in the learning process, peer role in…
Descriptors: Cognitive Development, Cognitive Tests, Epistemology, Higher Education
Wolcott, Willa; And Others – 1988
The occurrence of discrepant, or non-contiguous, scores during holistic scoring of writing proficiency is usually blamed on variables of the test reader or external factors. It is hypothesized that characteristics of the essays themselves sometimes may generate discrepancies in essay scoring. Data from 152 essays of college sophomores in teacher…
Descriptors: College Students, Essay Tests, Higher Education, Holistic Evaluation
Haggard, Cynthia S.; Lang, Duaine C. – 1988
The question of whether weighting (rank-ordering) of student teacher evaluation criteria, as determined by student and supervising teachers, university supervisors, and employing officials, results in better differentiation between and among student teachers is explored in this paper. Participants in the study were asked to rank order 15…
Descriptors: Equated Scores, Evaluation Criteria, Higher Education, Interrater Reliability
Peer reviewedKolm, Paul; Verhulst, Steven J. – Evaluation and the Health Professions, 1987
The hypothesis that first-year medical residents' evaluate their performance in a more differentiated manner than their supervisors was evaluated. An evaluation form was developed to obtain ratings of performance in 13 specific areas. Results showed that residents were more discriminating than supervisors in evaluating residents performance. (JAZ)
Descriptors: Analysis of Variance, Comparative Testing, Graduate Medical Students, Higher Education
Peer reviewedSuter, W. Newton; Roberts, William L. – Contemporary Educational Psychology, 1987
This study examined halo in raters' beliefs of item (attribute) relatedness. College students' prior beliefs of the co-occurrence of teaching attributes were correlated with actual correlation of teaching attributes of fictional college professors. Results showed some support for beliefs-of-relatedness source of halo. (LMO)
Descriptors: College Students, Correlation, Error of Measurement, Higher Education
Peer reviewedBohn, Christine A.; Bohn, Emil – Communication Education, 1985
Demonstrates the inadequacy of students' rating of classroom speeches: student raters were both unreliable and sources of considerable error. Emphasizes the importance of training to reduce error rate. (PD)
Descriptors: College Students, Error of Measurement, Evaluation Methods, Higher Education
Sandene, Brent; Horkay, Nancy; Bennett, Randy Elliot; Allen, Nancy; Braswell, James; Kaplan, Bruce; Oranje, Andreas – National Center for Education Statistics, 2005
This publication presents the reports from two studies, Math Online (MOL) and Writing Online (WOL), part of the National Assessment of Educational Progress (NAEP) Technology-Based Assessment (TBA) project. Funded by the National Center for Education Statistics (NCES), the Technology-Based Assessment project is intended to explore the use of new…
Descriptors: Grade 8, Statistical Analysis, Scoring, Familiarity
Clare, Lindsay; Valdes, Rosa; Pascal, Jenny; Steinberg, Joan Rector – 2001
This report describes ongoing research investigating the use of teachers' assignments as an indicator of classroom practice. The purpose of this work is to develop a measure of students' learning environments that potentially could be used to help monitor the influence of school reform efforts on the quality of the classroom learning environment…
Descriptors: Assignments, Educational Change, Educational Practices, Educational Quality
Moore, Alan D.; Young, Suzanne – 1997
As schools move toward performance assessment, there is increasing discussion of using these assessments for accountability purposes. When used for making decisions, performance assessments must meet high standards of validity and reliability. One major source of unreliability in performance assessments is interrater disagreement. In this paper,…
Descriptors: Accountability, Correlation, Elementary Secondary Education, Generalizability Theory
Kronowitz, Ellen; Finney, Victoria – California Journal of Teacher Education, 1983
Elementary school pupils judged their student teachers' performances in the areas of planning, instructional skill, evaluation, and behavior, and in classroom organization and control. Their evaluations were compared with adult observers' ratings. Results indicate that elementary school students can assess performance and discriminate among…
Descriptors: Elementary Education, Elementary School Students, Evaluation Criteria, Evaluation Methods
Peer reviewedBehuniak, Peter, Jr.; And Others – Educational and Psychological Measurement, 1982
This study examined how local content specialists performed when applying the Angoff and Nedelsky standard setting procedures to objective-referenced instruments in reading and mathematics. Results revealed several differences between the standard setting procedures in terms of both level and consistency of the cut scores generated. (Author/BW)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Interrater Reliability
Mudford, Oliver C.; Hogg, James; Roberts, Jessica – American Journal on Mental Retardation, 1997
Continuous observational recording over 57 hours evaluated behavior states of three adults with profound and multiple disabilities. Two independent observers also recorded for 22 hours. Although overall percentage agreement was satisfactory (above 80%), agreement on occurrence was unsatisfactory (mean of 65%). Agreement data were superimposed on…
Descriptors: Data Analysis, Data Collection, Evaluation Methods, Interrater Reliability
Rojahn, Johannes; Tasse, Marc J.; Sturmey, Peter – American Journal on Mental Retardation, 1997
Development of the Stereotyped Behavior Scale for adolescents and adults with mental retardation is described. Use with 600 individuals resulted in refinement and a 26-item scale with an internal consistency alpha of 0.88, test-retest reliability of p=0.90, and interrater reliability of p=0.76. (DB)
Descriptors: Adolescents, Adults, Behavior Patterns, Behavior Rating Scales
Peer reviewedFalchikov, Nancy; Magin, Douglas – Assessment & Evaluation in Higher Education, 1997
Discusses concerns and research about college student peer evaluation, particularly with regard to gender bias. Reports a study using blind marking, and examines results in relation to task and other contextual variables. Concludes that the blind marking technique contributes to the reliability of peer assessment, and outlines additional…
Descriptors: College Instruction, College Students, Group Dynamics, Higher Education


