Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedCharters, W. W., Jr.; Pitner, Nancy J. – Educational and Psychological Measurement, 1986
This paper reports on the application of Yukl's Management Behavior Survey in 47 elementary schools. Three problems with the instrument are discussed: (1) lack of response; (2) interrater disagreement; and (3) ceiling effects. The dimensionality of the measure is evaluated through factor analysis. (Author/LMO)
Descriptors: Administrators, Behavior Rating Scales, Elementary Education, Factor Analysis
Peer reviewedMaatsch, Jack L. – Evaluation and Program Planning, 1987
This paper presents and interprets examiner and participant performance data obtained from an experimental field test of the test item and case simulation libraries of the American Board of Emergency Medicine Specialty Certificate Examination. Subjects were 94 medical students, residents, and emergency physicians. (Author/LMO)
Descriptors: Clinical Diagnosis, Competence, Factor Analysis, Graduate Medical Education
Peer reviewedDeutsch, Stuart Jay; Malmborg, Charles J. – Evaluation and Program Planning, 1986
A questionnaire is designed to allow assessment of a simple additive value function for testing respondent preferences for different types of information used in evaluating police services. Responses are analyzed to determine what types of information different stakeholder groups consider useful. (Author/LMO)
Descriptors: Adults, Analysis of Variance, Data Collection, Evaluation Methods
Peer reviewedIrvine, Jacqueline Jordan – Journal of Educational Psychology, 1986
Students' initiating behaviors, teachers' verbal feedback, and students' available response opportunities were studied in 63 classrooms in relation to student race, sex, and grade level, using a modified Brophy-Good Observation System. Results indicated that male students initiate more positive and negative interactions with teachers than do…
Descriptors: Analysis of Variance, Elementary Education, Feedback, Interrater Reliability
Peer reviewedGresham, Frank M. – School Psychology Review, 1984
The evidence for the psychometric adequacy of behavioral interviews in terms of traditional psychometric theory and generalizability theory are reviewed. The review resulted in the conclusion that behavioral interviews have some evidence for interrater reliability, content validity, and criterion-related validity. Additional research in several…
Descriptors: Behavior Patterns, Behavior Problems, Functional Behavioral Assessment, Generalizability Theory
Aydin, Selami – Online Submission, 2006
This research aimed to investigate the effect of computers on the test and inter-rater reliability of writing test scores of ESL learners. Writing samples of 20 pen-paper and 20 computer group students were scored in analytic scoring method by two scorers, and then the scores were analyzed in Alpha (Cronbach) model. The results showed that the…
Descriptors: Writing Tests, Interrater Reliability, Test Reliability, English (Second Language)
Crehan, Kevin D. – 1997
Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…
Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability
Lee, Yong-Won – 2001
An essay test is now an integral part of the computer based Test of English as a Foreign Language (TOEFL-CBT). This paper provides a brief overview of the current TOEFL-CBT essay test, describes the operational procedures for essay scoring, including the Online Scoring Network (OSN) of the Educational Testing Service (ETS), and discusses major…
Descriptors: Computer Assisted Testing, English (Second Language), Essay Tests, Interrater Reliability
Ridge, Kirk – 2001
This study investigated whether raters in two different training groups would demonstrate halo error when each rater scored all five responses to five different mathematics performance-based items from each student. One group of 20 raters was trained by an experienced scoring director with item-specific scoring rubrics and the opportunity to…
Descriptors: Evaluators, Feedback, Interrater Reliability, Junior High School Students
Sireci, Stephen G.; Rizavi, Saba – 2000
Although computer-based testing is becoming popular, many of these tests are limited to the use of selected-response item formats due to the difficulty in mechanically scoring constructed-response items. This limitation is unfortunate because many constructs, such as writing proficiency, can be measured more directly using items that require…
Descriptors: College Students, Comparative Analysis, Computer Uses in Education, Essay Tests
Shermis, Mark D.; Koch, Chantal Mees; Page, Ellis B.; Keith, Timothy Z.; Harrington, Susanmarie – 1999
This study used Project Essay Grade (PEG) to evaluate essays both holistically and with the rating of traits (content, organization, style, mechanics, and creativity) for Web-based student essays that serve as placement tests at a large Midwestern university. In addition, the use of a TopicScore, or measure of topic content for each assignment,…
Descriptors: Automation, College Students, Construct Validity, Essays
Peer reviewedStone, C. Addison – Journal of Learning Disabilities, 1997
High school students (N=26) with learning disabilities, their parents, and their special education teachers rated the students' skills in 21 specific areas such as general ability, oral language, reading, written language, and social skills. Parents' ratings were consistent with teachers' in 16 areas and lower in 5 areas. Students' ratings were…
Descriptors: High School Students, High Schools, Interrater Reliability, Learning Disabilities
Peer reviewedHalleck, Gene B. – Foreign Language Annals, 1996
This study investigated the interrater reliability of proficiency-level judgments of graduate student trainee raters on oral proficiency interviews (OPIs). Trainees' ratings were compared with the judgments of a certified American Council on the Teaching of Foreign Languages (ACTFL) tester for 150 interviews. (Author/JL)
Descriptors: Comparative Analysis, Graduate Students, Higher Education, Interrater Reliability
Peer reviewedSuen, Hoi K.; And Others – Journal of Early Intervention, 1995
This paper suggests that in addressing the issue of parent-professional congruence in child assessment, researchers should avoid focusing on the conventional aspects of interrater reliability and rater interchangeability, but rather should focus on the reliability of the pooled assessment information from parents and professionals. A…
Descriptors: Disabilities, Early Childhood Education, Early Intervention, Evaluation Methods
Peer reviewedCanivez, Gary L.; Watkins, Marley W. – Assessment for Effective Intervention, 2002
Teaching professionals (n=29) who shared the same classroom for a minimum of one hour per day provided independent ratings of the same child (ages 7-17) on the Adjustment Scales for Children and Adolescents (ASCA). Results indicated that statistically significant interrater agreement was achieved across all 22 syndromic profile classification…
Descriptors: Behavior Rating Scales, Disabilities, Elementary Secondary Education, Emotional Adjustment


