Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedSerlin, Ronald C.; Marascuilo, Leonard A. – Journal of Educational Statistics, 1983
Two alternatives to the problems of conducting planned and post hoc comparisons in tests of concordance and discordance for G groups of judges are examined. The two models are illustrated using existing data. (Author/JKS)
Descriptors: Attitude Measures, Comparative Analysis, Interrater Reliability, Mathematical Models
Peer reviewedHarvey, Robert J.; Hayes, Theodore L. – Personnel Psychology, 1986
Showed that reliabilities in the .50 range can be obtained when raters rule out only 15-20% of the items on the Position Analysis Questionnaire as "Does Not Apply" and respond randomly to the remainder. (Author/ABB)
Descriptors: Interrater Reliability, Job Analysis, Monte Carlo Methods, Occupational Information
Peer reviewedConger, Rand D.; And Others – Journal of Marriage and the Family, 1986
Examined the comparability of three techniques that are used to assess the dependability of family observational measures: analyses of observer agreement, reliability, and generalizability. Results indicated no single evaluative technique will always be most conservative in estimating the quality of observations. Suggests that multiple assessments…
Descriptors: Family Involvement, Generalization, Interrater Reliability, Measurement Techniques
Peer reviewedO'Sullivan, Sean; And Others – Journal of Marital and Family Therapy, 1984
Explores the reliability of the categories used to describe family structure in structural family therapy. Five clinicians independently rated three initial conjoint family interviews. Results are discussed in terms of their demonstration of the utility of the structural nonmenclature, some conceptual problems in the structural nomenclature, and…
Descriptors: Cocounseling, Family Counseling, Family Problems, Family Structure
Peer reviewedMorris, Woodrow W.; Boutelle, Sandra – Gerontologist, 1985
Examines the feasibility of making multidimensional functional assessments among 22 older persons by using a questionnaire. Analysis of ratings and objective scores suggests that among relatively independent, well elderly individuals, self-administered assessment should be the mode of choice. Clinical and survey research applications are…
Descriptors: Interrater Reliability, Older Adults, Research Methodology, Scoring
Peer reviewedGoodwin, Laura D.; Goodwin, William L. – Evaluation and the Health Professions, 1984
The views of prominant qualitative methodologists on the appropriateness of validity and reliability estimation for the measurement strategies employed in qualitative evaluations are summarized. A case is made for the relevance of validity and reliability estimation. Definitions of validity and reliability for qualitative measurement are presented…
Descriptors: Evaluation Methods, Experimenter Characteristics, Interrater Reliability, Reliability
Peer reviewedCornelius, Edwin T.; And Others – Personnel Psychology, 1984
Questions the observed correlation between job experts and naive raters using the Position Analysis Questionnaire (PAQ); and conducts a replication of the Smith and Hakel study (1979) with college students (N=39). Concluded that PAQ ratings from job experts and college students are not equivalent and therefore are not interchangeable. (LLL)
Descriptors: College Students, Higher Education, Interrater Reliability, Job Analysis
van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
Descriptors: Interrater Reliability, Judges, Probability, Standard Setting
De Champlain, Andre F.; Gessaroli, Marc E.; Floreck, Lisa M. – 2000
The purpose of this study was to estimate the extent to which recording variability among standardized patients (SPs) has an impact on classification consistency with data sets simulated to reflect performances on a large-scale clinical skills examination. SPs are laypersons trained to portray patients in clinical encounters (cases) and to record…
Descriptors: Classification, Interrater Reliability, Licensing Examinations (Professions), Medical Education
Peer reviewedSandburg, Jorgen – Higher Education Research and Development, 1997
Argues that interrater reliability, traditionally used in phenomenographic research, is unreliable for establishing the reliability of research results; it does not take into account the researcher's procedures for achieving fidelity to the individuals' conceptions investigated, and use of interrater reliability based on objectivist epistemology…
Descriptors: Educational Research, Epistemology, Interrater Reliability, Qualitative Research
Peer reviewedLewis, Chad T.; Stevens, Cynthia Kay – Public Personnel Management, 1990
A total of 204 business students organized in committees evaluated jobs for accountability, knowledge and skills, and mental demands. The same position was rated more highly when held by a male rather than a female, regardless of whether the committee was predominantly male or female. The importance of anonymity of job holders when conducting job…
Descriptors: College Students, Interrater Reliability, Job Analysis, Sex Bias
Peer reviewedUmesh, U. N.; And Others – Educational and Psychological Measurement, 1989
An approach is provided for calculating maximum values of the Kappa statistic of J. Cohen (1960) as a function of observed agreement proportions between evaluators. Separate calculations are required for different matrix sizes and observed agreement levels. (SLD)
Descriptors: Equations (Mathematics), Evaluators, Heuristics, Interrater Reliability
Peer reviewedCordes, Anne K.; Ingham, Roger J. – Journal of Speech and Hearing Research, 1994
This paper reviews the prominent concepts of the stuttering event and concerns about the reliability of stuttering event measurements, specifically interjudge agreement. Recent attempts to resolve the stuttering measurement problem are reviewed, and the implications of developing an improved measurement system are discussed. (Author/JDD)
Descriptors: Data Collection, Interrater Reliability, Measurement Techniques, Observation
Peer reviewedMarcoulides, George A.; Simkin, Mark G. – Journal of Education for Business, 1995
Each paper written by 60 sophomores in computer classes received 3 peer evaluations using a structured evaluation process. Overall, students were able to grade efficiently and consistently in terms of overall score and selected criteria (subject matter, content, and mechanics). (SK)
Descriptors: Higher Education, Interrater Reliability, Peer Evaluation, Undergraduate Students
Peer reviewedDriessen, Marie-Jose; And Others – Occupational Therapy Journal of Research, 1995
Two occupational therapists in an interrater test and 9 in an intrarater test used a form based on the International Classification of Impairments, Disabilities, and Handicaps to evaluate 50 patients in a psychiatric hospital and 50 in a rehabilitation center. Based on percentage of agreement and Cohen's kappa, the reliability of the diagnoses was…
Descriptors: Clinical Diagnosis, Disabilities, Interrater Reliability, Occupational Therapy


