Publication Date
| In 2026 | 0 |
| Since 2025 | 56 |
| Since 2022 (last 5 years) | 282 |
| Since 2017 (last 10 years) | 778 |
| Since 2007 (last 20 years) | 2040 |
Descriptor
| Interrater Reliability | 3122 |
| Foreign Countries | 654 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 24 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Taube, Kurt T.; Newman, Larry S. – 1996
A method of estimating Rasch-model difficulty calibrations from judges' ratings of item difficulty is described. The ability of judges to estimate item difficulty was assessed by correlating estimated and empirical calibrations on each of four examinations offered by the American Association of State Social Work Boards. Thirteen members of the…
Descriptors: Correlation, Cutting Scores, Difficulty Level, Estimation (Mathematics)
Beasley, T. Mark; Leitner, Dennis W. – 1993
The L statistic of E. B. Page (1963) tests the agreement of a single group of judges with an a priori ordering of alternative treatments. This paper extends the two group test of D. W. Leitner and C. M. Dayton (1976), an extension of the L test, to analyze difference in consensus between two unequally sized groups of judges. Exact critical values…
Descriptors: Comparative Analysis, Equations (Mathematics), Estimation (Mathematics), Evaluators
Micceri, Theodore; And Others – 1987
Several issues relating to agreement estimates for different types of data from performance evaluations are considered. New indices of agreement are presented for ordinal level items and for summative scores produced by nominal or ordinal level items. Two sets of empirical data illustrate the performance of the two formulas derived to estimate…
Descriptors: Correlation, Data Analysis, Educational Research, Estimation (Mathematics)
Weare, Jane; And Others – 1987
This annotated bibliography was developed upon noting a deficiency of information in the literature regarding the training of raters for establishing agreement. The ERIC descriptor, "Interrater Reliability", was used to locate journal articles. Some of the 33 resulting articles focus on mathematical concepts and present formulas for computing…
Descriptors: Annotated Bibliographies, Cloze Procedure, Correlation, Essay Tests
Littlefield, Robert S. – 1986
Comparing the manner in which contestants' scores were tabulated at both the 1985 American Forensic Association National Individual Events Tournament (AFA-NIET) and National Forensic Association Individual Events Nationals (NFA-IEN), a study (1) examined whether a correlation exists between contestants placing in the quarterfinals with five…
Descriptors: Debate, Eligibility, Interrater Reliability, Judges
Kieren, Dianne K.; Munro, Brenda – 1985
Decision making about an observational recording system for family interaction research is crucial. Alternative coding-recording methods and combinations thereof are discussed, including: (1) paper-and-pencil on-site method; (2) video-tapes; (3) paper-and-pencil and mechanical coding devices; (4) transcripts; and (5) transcripts combined with…
Descriptors: Comparative Analysis, Decision Making, Family Life, Family Problems
Peer reviewedMagnan, Sally Sieloff – Canadian Modern Language Review, 1987
Differences between the academic (American Council on the Teaching of Foreign Languages) and government (Foreign Service Institute) versions of the oral proficiency interview test are examined, and data from two studies of interrater reliability are presented and discussed. (MSE)
Descriptors: Evaluation Methods, Interrater Reliability, Language Proficiency, Language Tests
Peer reviewedScott, M. M.; Hatfield, James G. – Journal of Educational Measurement, 1985
Differences in agreement between observers and analysts of naturalistic narrative data cause problems in observation research. This paper discusses the advantages and disadvantages of several possible solutions. (Author/GDC)
Descriptors: Behavioral Science Research, Data Analysis, Data Collection, Interrater Reliability
Peer reviewedGreen, Kathy – Educational and Psychological Measurement, 1985
Five sets of paired comparison judgments were made concerning test item difficulty, in order to identify the most probable source of intrasensitivity in the data. The paired comparisons method was useful in providing information about sensitivity to stimulus differences, but less useful for assessing dimensionality of judgment criteria.…
Descriptors: Adults, Difficulty Level, Evaluative Thinking, Higher Education
Peer reviewedTarico, Valerie S.; And Others – Journal of Counseling Psychology, 1986
Compared three methods of rating thoughts: self-rating by subjects, rating by experts with thoughts presented randomly, and rating by experts with thoughts presented in context among 107 students who listed their thoughts prior to giving a speech. Results indicated all three methods were equal in predictions of speech anxiety and performance.…
Descriptors: Anxiety, Cognitive Measurement, Cognitive Processes, Comparative Analysis
Peer reviewedZeren, Andrea S.; Makosky, Vivian Parker – Teaching of Psychology, 1986
Presents an in-class activity which uses videotaped television shows to teach time sampling, event sampling, and trait rating techniques. Students responded favorably to this activity, and many reported that it increased their understanding of the different observation techniques. (Author/JDH)
Descriptors: Behavior Rating Scales, Higher Education, Instructional Improvement, Interrater Reliability
Peer reviewedMontgomery, Barbara M. – Small Group Behavior, 1986
Investigates the relative and interactive effects of rater-, and ratee-, relationship-, situational-, and group-level contingencies on peer assessments of open communication. Results suggest that, given certain procedural conditions, peer assessments are highly reliable and valid. Rater bias accounted for a relatively small amount of rating…
Descriptors: College Students, Group Dynamics, Higher Education, Interaction Process Analysis
Peer reviewedFuqua, Dale R.; And Others – Journal of Counseling Psychology, 1984
Compares peer ratings, supervisor ratings, and self-ratings of counseling performance. Earlier studies of the relationship of performance ratings from different sources have indicated some comparability across source of rating, particularly late in the training process. These results indicated considerable variability across sources of ratings…
Descriptors: Counselor Evaluation, Counselor Performance, Counselor Training, Graduate Students
Peer reviewedSingletary, Michael W. – Journalism Quarterly, 1985
Reports that coders were able to judge adequately the difference between immediate reward and delayed reward in news stories but not the difference between subcategories. (FL)
Descriptors: Content Analysis, Interrater Reliability, Journalism, Mass Media
Peer reviewedO'Hara, Michael W.; Rehm, Lynn P. – Journal of Consulting and Clinical Psychology, 1983
Used the intraclass correlation coefficient to estimate the interrater reliability of judgments of clinician and novice raters of depressed females (N=20) who took the Hamilton Rating Scale for Depression (HRSD). Expert and student raters both made reliable ratings on the HRSD. Criterion validity for student raters was also satisfactory.…
Descriptors: College Students, Comparative Testing, Cost Effectiveness, Counselor Role


