Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Leitner, David; Trevisan, Mike – 1993
This paper presents findings of a case study that documented the implementation of a portfolio assessment system in response to mandated program improvement and assessed its impact on teacher and student behaviors. The sample included elementary and middle school teachers and students from three Chapter 1 schools in a rural California school…
Descriptors: Educational Assessment, Educational Improvement, Elementary Education, Evaluation Criteria
Kenyon, Dorry; Stansfield, Charles W. – 1993
This paper examines whether individuals who train themselves to score a performance assessment will rate acceptably when compared to known standards. Research on the efficacy of rater self-training materials developed by the Center for Applied Linguistics for the Texas Oral Proficiency Test (TOPT) is examined. Rater self-materials are described…
Descriptors: Bilingual Education, Comparative Analysis, Evaluators, Individual Characteristics
De Champlain, Andre F.; Margolis, Melissa J.; Ross, Linette P.; Macmillan, Mary K.; Klass, Daniel J. – 1998
The purpose of the present investigation was to address several critical issues relating to setting a performance standard on a nationally administered standardized patient examination (SPX). The specific goals of the study were to: (1) compare pass/fail rates from this exercise to those of past studies undertaken with the same examination; (2)…
Descriptors: Clinical Experience, Higher Education, Interrater Reliability, Medical Education
1998
This document contains three papers from a symposium on management assessment. In "The Air Force ROTC (Reserve Officer Training Corps) Selection System as a Predictor of Leadership" (Orlando V. Griego, George A. Morgan, Gary D. Geroy), 102 ROTC cadets rated their own leadership characteristics and were rated by subordinates; leaders and…
Descriptors: Administrator Evaluation, Adult Education, Employee Attitudes, Evaluation Methods
PDF pending restorationYates, Beverly J. – 1991
The predictive validity of the National Association of Secondary School Principals (NASSP) assessment center evaluation process for principals is compared with the perceived effectiveness of a selected population of principals. The NASSP assessment center approach includes a case study, a personal interview, two exercises, and a scholastic…
Descriptors: Administrator Evaluation, Assessment Centers (Personnel), Case Studies, Comparative Analysis
Seal, Brenda C. – 1991
In order to better evaluate bilingualism in deaf children, this study examined whether observers (N=37) from different backgrounds would agree on deaf children's use of either American Sign Language (ASL) or English signing. Observers represented a range of background experience in a variety of schools and programs; 6 were deaf; 31 were hearing;…
Descriptors: American Sign Language, Bilingual Students, Bilingualism, Deafness
Aydin, Selami – Turkish Online Journal of Educational Technology - TOJET, 2006
This research aimed to investigate the effect of computers on the test and inter-rater reliability of writing test scores of ESL learners. Writing samples of 20 pen-paper and 20 computer group students were scored in analytic scoring method by two scorers, and then the scores were analyzed in Alpha (Cronbach) model. The results showed that the…
Descriptors: Foreign Countries, College Students, Computer Assisted Testing, English (Second Language)
Lunz, Mary E.; Stahl, John A. – 1990
Three examinations administered to medical students were analyzed to determine differences among severities of judges' assessments and among grading periods. The examinations included essay, clinical, and oral forms of the tests. Twelve judges graded the three essays for 32 examinees during a 4-day grading session, which was divided into eight…
Descriptors: Clinical Diagnosis, Comparative Testing, Difficulty Level, Essay Tests
Thomson, W. Scott – 1989
Three contextual factors, the gender of the principal, the choice of subject matter used for the demonstration of competence, and number of years of teaching experience, have been shown to have an effect on the outcome of teacher evaluation. The annual evaluations of 521 elementary personnel using the Florida state-mandated single assessment were…
Descriptors: Administrator Characteristics, Elementary Education, Evaluation Criteria, Interrater Reliability
Sireci, Stephen G.; And Others – 1990
Although some researchers have argued against use of the term "content validity," the ability of a test item to adequately represent the domain of knowledge tested continues to be an issue of paramount importance in test construction. The present paper reviews previous analyses of test content and proposes a new empirical method for…
Descriptors: Cluster Analysis, Content Analysis, Content Validity, Evaluators
Friedman, Charles B.; Ho, Kevin T. – 1990
Eleven judges representing 11 different geographic regions in the United States participated in a standard-setting session designed to determine the possibility of obtaining interjudge consensus and intrajudge consistency simultaneously. Each judge had experience in the field for which standards were being set. The judges rated 65 multiple-choice…
Descriptors: Evaluators, Feedback, Interrater Reliability, Licensing Examinations (Professions)
Littlefield, John H.; Troendle, G. Roger – 1986
This study compares intra- and inter-rater agreement and reliability when using three different rating form formats to assess the same stimuli. One format requests assessment by marking detailed criteria without an overall judgement; the second format requests only an overall judgement without the use of detailed criteria; and the third format…
Descriptors: Cognitive Processes, Dental Evaluation, Dental Schools, Evaluation Criteria
Cason, Gerald J.; Cason, Carolyn L. – 1989
The use of three remedies for errors in the measurement of ability that arise from differences in rater stringency is discussed. Models contrasted are: (1) Conventional; (2) Handicap; and (3) deterministic Rater Response Theory (RRT). General model requirements, power, bias of measures, computing cost, and complexity are contrasted. Contrasts are…
Descriptors: Ability, Achievement Rating, Error of Measurement, Evaluation Methods
Bunch, Michael B.; Littlefair, Wendy – 1988
A total of 2,000 essays written by 1,000 students was submitted to generalizability analyses for domain-referenced tests. Each student had written one essay on each of two prompts representing two models of discourse. Each essay was read by six readers and judged on a scale of from 1 to 4. No reader read essays from both prompts. Reader agreement…
Descriptors: Cutting Scores, Essay Tests, Generalizability Theory, Interrater Reliability
Cason, Gerald J.; Cason, Carolyn L. – 1987
This study describes a computer based, performance rating information processing system, performance rating theory, and programs for the application of the theory to obtain ratings free from the effects of reviewer stringency in reviewing abstracts of conference papers. Originally, the Performance Rating (PR) System was used to evaluate the…
Descriptors: Abstracts, Computer Oriented Programs, Conference Papers, Data Processing

Peer reviewed
