Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Francis, Alexandria S.; Holmes, Susan E. – 1983
Discrepancies among the standards produced by different criterion-referenced standard-setting techniques may be the result of a failure to adequately define the minimally competent candidate. Current research in this area is reviewed in terms of three categories: studies in which no formal assistance in conceptualization is given to judges,…
Descriptors: Certification, Criterion Referenced Tests, Cutting Scores, Interrater Reliability
PDF pending restorationDaiker, Donald A.; Grogan, Nedra – 1985
The role of sample papers (i.e., anchor papers, prototypes, range-finders) in holistic evaluation of writing is discussed. When, where, and how many sample papers are to be selected, and who should perform the selection are covered. The process of sample selection should proceed as follows: (1) a general reading of papers by committee members to…
Descriptors: Advanced Placement, Essay Tests, Evaluators, Higher Education
Szapocznik, Jose; And Others – 1987
Research showing psychodynamic child therapy to be less effective than other forms of child treatment have used outcome measures focusing on symptomatic and behavioral change rather than on psychodynamic processes. A child therapy assessment procedure than measures the psychological functioning of the child in a psychodynamically meaningful way is…
Descriptors: Child Development, Children, Counseling Effectiveness, Evaluation Methods
Breland, Hunter M.; Jones, Robert J. – 1988
The reliability, validity, and score discrepancies of 94 expository essays scored in conference versus remote settings were studied. Focus was on comparing holistic ratings obtained in both settings. Essays written by college freshmen on two different topics were scored by readers working in a conference setting and by different readers working in…
Descriptors: College Freshmen, Comparative Analysis, Conferences, Essay Tests
Reid, Jerry B. – 1985
This report investigates an area of uncertainty in using the Angoff method for setting standards, namely whether or not a judge's conceptualizations of borderline group performance are realistic. Ratings are usually made with reference to the performance of this hypothetical group, therefore the Angoff method's success is dependent on this point.…
Descriptors: Certification, Cutting Scores, Difficulty Level, Interrater Reliability
Peer reviewedMarsh, Herbert W. – International Journal of Educational Research, 1987
The reliability, long-term stability, and generalizability of student ratings of teacher effectiveness are discussed. The Students' Evaluation of Educational Quality (SEEQ) instrument is examined from these perspectives. The multidimensionality of student response to such evaluation instruments must be recognized. (SLD)
Descriptors: College Students, Generalizability Theory, Interrater Reliability, Postsecondary Education
Peer reviewedShapiro, Edward S.; Goldberg, Ronald – School Psychology Review, 1986
The effects of independent, interdependent, and dependent group contingencies in increasing spelling performance of two classes of sixth grade students were compared using an alternating treatments design. Results suggest that all three contingencies substantially improving spelling performance on daily tests. (Author/LMO)
Descriptors: Achievement Gains, Aptitude Treatment Interaction, Grade 6, Group Behavior
Peer reviewedSharpley, Christopher F. – Journal of Counseling & Development, 1986
Although single-subject research techniques are valuable to counselors, analyzing change by graphs alone is open to major sources of error. Discusses these sources and explores the issues of serial dependency, unreliability of graphs, interjudge disagreement on graphed data, and the use of time-series statistics with relevance to the counselor in…
Descriptors: Counseling Effectiveness, Counseling Techniques, Data Interpretation, Graphs
Peer reviewedHogan, Andrew – Evaluation and the Health Professions, 1986
This study derives the economic costs of misclassification in nursing home patient classification systems. These costs are then used as weights to estimate the reliability of a functional assessment instrument. Results suggest that reliability must be redefined and remeasured with each substantively new application of an assessment instrument.…
Descriptors: Classification, Correlation, Cost Effectiveness, Diagnostic Tests
Peer reviewedBenton, Stephen L.; Kiewra, Kenneth A. – Journal of Educational Measurement, 1986
This paper assessed the relationships among holistic writing ability, the Test of Standard Written English, and four tests of organizational ability. Findings showed a significant correlation between writing ability and the tests. It was concluded that tests assessing organizational strategies ought to be included in assessments of writing…
Descriptors: Correlation, Essay Tests, Higher Education, Holistic Evaluation
Peer reviewedStrong, Michael; Rudser, Steven Fritsch – Sign Language Studies, 1986
When hearing raters subjectively evaluated the signed and spoken output of 25 sign language interpreters, rater agreement was between 0.52-0.86; the correlation between subjective and objective evaluation was between 0.59-0.79. Raters were unsuccessful in identifying which interpreters had deaf parents. (CB)
Descriptors: Correlation, Deaf Interpreting, Deafness, Evaluation Methods
Peer reviewedRachal, John R. – Community/Junior College Quarterly of Research and Practice, 1984
Describes a study comparing the grading predelictions of English instructors from two- and four-year colleges. Study findings, based on grades given to a set of five themes by instructors from both settings, showed community college instructors to be a letter grade more lenient than university instructors. (DMM)
Descriptors: Academic Standards, College Faculty, College Freshmen, Colleges
Peer reviewedSweedler-Brown, Carol O. – English Journal, 1985
Reports findings of a study conducted to determine whether the amount of training and experience readers have had in using a particular grading scale correlates with their judgments about the quality of an essay, and whether the amount of training and experience affects the consistencies of their judgments. (EL)
Descriptors: Evaluation Criteria, Interrater Reliability, Methods Research, Secondary Education
Peer reviewedBlanck, Peter David; Rosenthal, Robert – Journal of Educational Psychology, 1984
In recorded interviews, 10 camp counselors described children they believed to have high social or athletic ability. Judges rated counselors' voice tone as warmer and less hostile when describing children for whom they had high expectations. Less competent counselors were more prone to biasing effects. (Author/BS)
Descriptors: Athletics, Attitude Measures, Camping, Children
Peer reviewedHoge, Robert D. – Review of Educational Research, 1985
An evaluation of the validity of direct observation measures of pupil classroom behaviors is presented. Three types of measures are discussed: molar, molecular, and molecular-composite. Consistent support for the validity of molar and molecular-composite types of measures is revealed. (Author/LMO)
Descriptors: Behavior Rating Scales, Classroom Observation Techniques, Data Analysis, Elementary Secondary Education


