Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedHosie, Peter – Australian Journal of Education, 1986
Interviews can provide valuable information for social researchers, but problems that may affect the quality of the information gathered should be addressed. These include subject-researcher reactivity, role relations, truth telling, reporting of the information collected, and researcher characteristics. A profile of effective interviewer…
Descriptors: Interrater Reliability, Interviews, Questioning Techniques, Research Methodology
Peer reviewedHughes, Garry L.; Prien, Erich P. – Personnel Psychology, 1986
Investigated psychometric properties of three methods of scoring a Mixed Standard Scale performance evaluation: a patterned procedure, simple nonpatterned scoring procedure and procedure assigning differential weights to statements on the basis of scale values provided by subject matter experts. Found no differences in the score distribution…
Descriptors: Evaluation Methods, Interrater Reliability, Scoring, Scoring Formulas
Peer reviewedBerry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
A FORTRAN subroutine is presented to calculate a generalized measure of agreement between multiple raters and a set of correct responses at any level of measurement and among multiple responses, along with the associated probability value, under the null hypothesis. (Author)
Descriptors: Computer Software, Interrater Reliability, Measurement Techniques, Probability
Peer reviewedDoble, Susan E. – Occupational Therapy Journal of Research, 1991
Process Skills Assessment is an observational assessment designed to evaluate process skills as demonstrated during the performance of a self-selected task. Retest reliability results indicated that assessment can be used to measure changes in clients' process skills after occupational therapy if the same task is used upon readministration.…
Descriptors: Interrater Reliability, Occupational Therapy, Skill Analysis, Test Reliability
Peer reviewedSinacore, James M.; Connell, Karen J.; Olthoff, Allan J.; Friedman, Michael H.; Gecht, Maureen R. – Evaluation and the Health Professions, 1999
Presents a method for measuring interrater agreement on checklists. The technique computes a single agreement score from the concordance of raters' check mark configurations and derives an overall coefficient of agreement called "phi." A medical education study illustrates the phi methodology. (SLD)
Descriptors: Check Lists, Interrater Reliability, Measurement Techniques, Medical Education
Peer reviewedMeyer, Gregory J. – Psychological Assessment, 1997
In reply to criticism of the Rorschach Comprehensive System (CS) by J. Wood, M. Nezworski, and W. Stejskal (1996), this article presents a meta-analysis of published data indicating that the CS has excellent chance-corrected interrater reliability. It is noted that the erroneous assumptions of Wood et al. make their assertions about validity…
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity
Peer reviewedWood, James M.; Nezworski, M. Teresa; Stejskal, William J. – Psychological Assessment, 1997
G. Meyer (1997) attempts to refute the present authors' criticisms of the interrater reliability of the Rorschach Comprehensive System (CS) but misrepresents their position and offers a flawed meta-analysis in support of his own. Rorschach proponents need to undertake high-quality replicated studies of CS reliability and validity. (SLD)
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity
Peer reviewedMeyer, Gregory J. – Psychological Assessment, 1997
Replies to Wood et al. and documents limitations of their conclusions about the Rorschach Comprehensive System (CS), supporting Meyer's own meta-analysis, which finds adequate interrater reliability for the CS. (SLD)
Descriptors: Interrater Reliability, Meta Analysis, Test Use, Test Validity
Campbell, Justin S.; Pulos, Steven; Hogan, Mike; Murry, Francie – Educational and Psychological Measurement, 2005
This study examines the average reliability of Hare Psychopathy Checklists (PCLs) adapted for use in samples of youthful offenders (aged 12 to 21 years). Two forms of reliability are examined: 18 alpha estimates of internal consistency and 18 intraclass correlation (two or more raters) estimates of interrater reliability. The results, an average…
Descriptors: Interrater Reliability, Reliability, Psychopathology, Adolescents
Orwig, Denise; Brandt, Nicole; Gruber-Baldini, Ann L. – Gerontologist, 2006
Purpose: The purpose of this study was to describe the Medication Management Instrument for Deficiencies in the Elderly (MedMaIDE) and to provide results of reliability and validity testing. Design and Methods: Participants were 50 older adults, aged 65 and older, who lived in the community, took at least one prescription medication, and were then…
Descriptors: Older Adults, Validity, Interrater Reliability, Correlation
Hoben, Kirsty; Varley, Rosemary; Cox, Richard – International Journal of Language & Communication Disorders, 2007
Background: Difficulties experienced by novices in clinical reasoning have been well documented in many professions, especially medicine (Boshuizen and Schmidt 1992, 2000; Elstein, Shulman and Sprafka 1978; Patel and Groen 1986; Rikers, Loyens and Schmidt 2004). These studies have shown that novice clinicians have difficulties with both knowledge…
Descriptors: Test Results, Speech Therapy, Patients, Medical Education
Einfeld, S.; Tonge, B.; Chapman, L.; Mohr, C.; Taffe, J.; Horstead, S. – Journal of Applied Research in Intellectual Disabilities, 2007
Background: There is a history of over-prescription of antipsychotics to individuals with intellectual disability (ID), while antidepressants may be under-prescribed. However, appropriate treatment is best supported when the diagnosis of psychosis or depression is valid and carries good predictive validity. The present authors report a study…
Descriptors: Mental Retardation, Psychosis, Predictive Validity, Interrater Reliability
Lang, W. Steve; Wilkerson, Judy R. – Online Submission, 2008
The National Council for Accreditation of Teacher Education (NCATE, 2002) requires teacher education units to develop assessment systems and evaluate both the success of candidates and unit operations. Because of a stated, but misguided, fear of statistics, NCATE fails to use accepted terminology to assure the quality of institutional evaluative…
Descriptors: State Standards, Validity, Resource Materials, Reliability
Russ, Rosemary S.; Scherr, Rachel E.; Hammer, David; Mikeska, Jamie – Science Education, 2008
Science education reform has long focused on assessing student inquiry, and there has been progress in developing tools specifically with respect to experimentation and argumentation. We suggest the need for attention to another aspect of inquiry, namely "mechanistic reasoning." Scientific inquiry focuses largely on understanding causal…
Descriptors: Discourse Analysis, Educational Change, Science Education, Inquiry
Whithaus, Carl; Harrison, Scott B.; Midyette, Jeb – Assessing Writing, 2008
This article examines the influence of keyboarding versus handwriting in a high-stakes writing assessment. Conclusions are based on data collected from a pilot project to move Old Dominion University's Exit Exam of Writing Proficiency from a handwritten format into a dual-option format (i.e., the students may choose to handwrite or keyboard the…
Descriptors: Writing Evaluation, Handwriting, Pilot Projects, Writing Tests

Direct link
