Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedRippey, Robert M.; Krutchkoff, David J. – Evaluation and the Health Professions, 1984
The method of paired comparisons was used to rank 42 dental students on their performance in emergency and screening clinic rotations. Results suggest this methodology may provide more internally consistent student assessments on more subtle aspects of clinical performance than those assessed by multiple-choice tests or written performance…
Descriptors: Clinical Teaching (Health Professions), College Faculty, Computer Software, Dental Students
Peer reviewedBirchler, Gary R.; And Others – American Journal of Family Therapy, 1984
Examined factors that influenced the concordant perceptions of 28 distressed and 28 nondistressed husbands and wives and trained coders who observed samples of their own and another couple's problem solving. Correlational analyses suggested greater insider-outsider perceptual agreement for distressed than nondistressed couples and for negative…
Descriptors: Behavior Patterns, Conflict Resolution, Congruence (Psychology), Interaction Process Analysis
Peer reviewedEpstein, Michael H.; Nieminen, Gayla S. – School Psychology Review, 1983
Teachers and classroom aides of learning disabled pupils were asked to complete the Conners Abbreviated Teacher Rating Scale (CATRS) on two separate occasions, one month apart. Inter-rater reliability for teachers (.866) and for aides (.602), and reliability across time for teachers (.866) and aides (.603) achieved acceptable levels. (Author/BW)
Descriptors: Elementary Education, Elementary School Teachers, Hyperactivity, Interrater Reliability
Weigle, Sara Cushing – 1994
This paper describes a study on rater training that involved the analysis of ratings given to English-as-a-Second-Language (ESL) compositions by 8 inexperienced and 8 experienced raters both before and after rater training, using FACETS (Linacre, 1990, 1993), which provides measures of rater severity and consistency. The testing text was a…
Descriptors: English (Second Language), Essay Tests, Evaluation Criteria, Evaluators
Peer reviewedZedeck, Sheldon; And Others – Personnel Psychology, 1983
Studied interviewer reliability, validity, and strategy for information integration. Candidates (N=412) for selection to a military division were interviewed and assessed. Results indicated that interviewers functioned in a similar fashion. Analyses of individual interviewers indicated higher reliability and individual differences among…
Descriptors: Cognitive Processes, Employment Interviews, Evaluation Criteria, Evaluation Methods
Peer reviewedLunz, Mary E.; Schumacker, Randall E. – Journal of Outcome Measurement, 1997
Results and interpretations of the data from a performance examination were compared for four methods of analysis for 74 medical specialty certification candidates: (1) traditional summary statistics; (2) inter-judge correlations; (3) generalizability theory; and (4) the multifaceted Rasch model. Advantages of the Rasch model are outlined. (SLD)
Descriptors: Comparative Analysis, Data Analysis, Generalizability Theory, Interrater Reliability
Peer reviewedSupovitz, Jonathan A.; MacGowan, Andrew, III; Slattery, Jean – Educational Assessment, 1997
Reports on the interrater reliability of a language arts portfolio assessment in the primary grades of the Rochester (New York) school system. Results from approximately 400 primary grade portfolios rated by 2 raters show that teachers can assess their own students' work reliably. (SLD)
Descriptors: Evaluation Methods, Evaluators, Interrater Reliability, Portfolio Assessment
Hill, Roger B. – Journal of Technology Education, 1997
The Observation Procedure for Technology Education Mental Processes, a computerized assessment tool, was based on duration and frequency of mental processes needed for problem solving. Videotapes of students completing problem-solving activities were used to identify the processes. Interrater reliability tests validated the program. (SK)
Descriptors: Cognitive Processes, Computer Software Development, Interrater Reliability, Measures (Individuals)
Peer reviewedGoffman, Lisa; And Others – Journal of Child Language, 1996
The influence of information level on the production of accuracy of 20 children was examined. Data were children's productions of nouns in sets of utterances referring to triplets of pictures representing noun-verb-noun utterances. (Author/JL)
Descriptors: Acoustic Phonetics, Child Language, Cognitive Processes, Grammar
Peer reviewedMilligan, Frank – Nurse Education Today, 1996
Grading profiles for formative and summative assessment in a British nursing school were designed with criterion referencing to improve validity and interrater and intercourse reliability. Assessment was conceptualized as an ethical activity that clarifies expectations through specification of criteria. (SK)
Descriptors: Criterion Referenced Tests, Evaluation Criteria, Foreign Countries, Formative Evaluation
Peer reviewedMiller, Ronald – South African Journal of Higher Education, 1996
In a study of criteria for and reliability of grading of college essays in introductory psychology, 16 essays were marked by 12 faculty and 20 graduate students. Analysis found that two content attributes (facts, examples) accounted for 82% of variance in grading by faculty, while five stylistic measures accounted for the remainder. Both faculty…
Descriptors: College Instruction, Essays, Evaluation Criteria, Grading
Peer reviewedVrancic, Daniela; Nanclares, Valeria; Soares, Delfina; Kulesz, Analia; Mordzinski, Claudia; Plebst, Christian; Starkstein, Sergio – Journal of Autism and Developmental Disorders, 2002
A study involving 30 Argentineans with autism evaluated the validity of the Autism Diagnostic Inventory-Telephone Screening in Spanish (ADI-TSS). The final version of the ADI-TSS could be assessed in 20 to 40 minutes and demonstrated a high validity, high interrater reliability, and high internal consistency. (Contains references.) (Author/CR)
Descriptors: Adults, Autism, Disability Identification, Foreign Countries
Peer reviewedAnderson, Stephen A. – Michigan Reading Journal, 2002
Considers the development of an inter-rater reliability correlation comparing the judgments, or scores, or each judge to see if their observations are similar. Presents a case study of the Northville Public Schools' data for the 2000 MEAP (Michigan Educational Assessment Program) Writing Test. Concludes that in this case study the state fails both…
Descriptors: Case Studies, Elementary Education, Evaluation Research, Interrater Reliability
Peer reviewedBrown, Annie – Language Testing, 2003
Examines the question of variation among interviewers of oral language proficiency interviews in the ways that they elicit demonstrations of communicative ability and the impact of this variation on candidate performance and raters' perceptions of candidate ability. A discourse analysis of two interviews involving the same candidate with two…
Descriptors: Discourse Analysis, Interrater Reliability, Interviews, Language Proficiency
Peer reviewedLunz, Mary E.; Stahl, John A. – Evaluation and the Health Professions, 1990
Examinations were analyzed using the Rasch model to determine differences in judge severity and grading period stringency for (1) essay examination (subjects were 12 judges and 32 examinees); (2) clinical examination (subjects were 18 judges and 217 examinees); and (3) oral examination (subjects were 46 judges and 270 examinees). (SLD)
Descriptors: Certification, Essay Tests, Evaluators, Examiners


