Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Peer reviewedLi, Mao-Neng Fred; Lautenschlager, Gary J. – Educational and Psychological Measurement, 1999
Describes a Statistical Analysis System (SAS) MACRO for computing various indices of interrater agreement, including a new generalizability coefficient, for categorical data in a single-facet, crossed design. (Author/SLD)
Descriptors: Classification, Generalizability Theory, Interrater Reliability, Qualitative Research
Peer reviewedLindell, Michael K.; Brandt, Christina J.; Whitney, David J. – Applied Psychological Measurement, 1999
Proposes a revised index of interrater agreement for multi-item ratings of a single target. This index is an inverse linear function of the ratio of the average obtained variance to the variance of the uniformly distributed random error. Discusses the importance of sample size for the index. (SLD)
Descriptors: Error of Measurement, Interrater Reliability, Sample Size
Schuster, Christof; Smith, David A. – Psychometrika, 2005
The rater agreement literature is complicated by the fact that it must accommodate at least two different properties of rating data: the number of raters (two versus more than two) and the rating scale level (nominal versus metric). While kappa statistics are most widely used for nominal scales, intraclass correlation coefficients have been…
Descriptors: Psychometrics, Statistics, Rating Scales, Correlation
Baird, Jo-Anne; Greatorex, Jackie; Bell, John F. – Assessment in Education Principles Policy and Practice, 2004
Marking reliability is purported to be produced by having an effective community of practice. No experimental research has been identified which attempts to verify empirically the aspects of a community of practice that have been observed to produce marking reliability. This research outlines what that community of practice might entail and…
Descriptors: Foreign Countries, Grades (Scholastic), Grading, Interrater Reliability
Munson, Benjamin; Brinkman, Kayla N. – American Journal of Speech-Language Pathology, 2004
Two experiments examined whether listening to multiple presentations of recorded speech stimuli influences the reliability and accuracy of judgments of children's speech production accuracy. In Experiment 1, 10 listeners phonetically transcribed words produced by children with phonological impairments after a single presentation and after the word…
Descriptors: Speech, Children, Phonetics, Speech Impairments
Roberts, Felicia; Robinson, Jeffrey D. – Human Communication Research, 2004
This investigation assesses interobserver agreement on conversation analytic (CA) transcription. Four professional CA transcribers spent a maximum of 3 hours transcribing 2.5 minutes of a previously unknown, naturally occurring, mundane telephone call. Researchers unitized transcripts into words, sounds, silences, inbreaths, outbreaths, and laugh…
Descriptors: Interrater Reliability, Discourse Analysis, Semantics, Pragmatics
Fleming, Judith A.; Taylor, Janeen McCracken; Carran, Deborah – Assessment for Effective Intervention, 2004
This article offers an alternative methodology for practitioners and researchers to use in establishing interrater reliability for testing purposes. The majority of studies on interrater reliability use a traditional methodology where by two raters are compared using a Pearson product-moment correlation. This traditional method of estimating…
Descriptors: Interrater Reliability, Methods, Correlation, Evaluation Methods
Schuster, Christof; Smith, David A. – Educational and Psychological Measurement, 2006
Because nominal-scale judgments cannot directly be aggregated into meaningful composites, the addition of a second rater is usually motivated by a desire to estimate the quality of a single rater's classifications rather than to improve reliability. When raters agree, the aggregation problem does not arise. Nevertheless, a proportion of this…
Descriptors: Models, Interrater Reliability, Measures (Individuals), Evaluation Criteria
Neuman, S.B.; Koh, S.; Dwyer, J. – Early Childhood Research Quarterly, 2008
The purpose of this study was to develop a valid and reliable tool for measuring the quality of the language and literacy environment in home-based settings. Based on a convergence of research on the ecological and psychological factors associated with early literacy development, the Child/Home Environmental Language and Literacy Observation…
Descriptors: Observation, Interrater Reliability, Urban Areas, Psychometrics
Anderson, William L.; Mitchell, Steven M.; Osgood, Marcy P. – CBE - Life Sciences Education, 2008
For the past 3 yr, faculty at the University of New Mexico, Department of Biochemistry and Molecular Biology have been using interactive online Problem-Based Learning (PBL) case discussions in our large-enrollment classes. We have developed an illustrative tracking method to monitor student use of problem-solving strategies to provide targeted…
Descriptors: Interrater Reliability, Problem Based Learning, Problem Solving, Molecular Biology
Boulet, John R.; van Zanten, Marta; de Champlain, Andre; Hawkins, Richard E.; Peitzman, Steven J. – Advances in Health Sciences Education, 2008
While checklists are often used to score standardized patient based clinical assessments, little research has focused on issues related to their development or the level of agreement with respect to the importance of specific items. Five physicians independently reviewed checklists from 11 simulation scenarios that were part of the former…
Descriptors: Check Lists, Foreign Medical Graduates, Patients, Clinical Experience
Gray, K. M.; Tonge, B. J.; Sweeney, D. J.; Einfeld, S. L. – Journal of Autism and Developmental Disorders, 2008
The ability to identify children who require specialist assessment for the possibility of autism at as early an age as possible has become a growing area of research. A number of measures have been developed as potential screening tools for autism. The reliability and validity of one of these measures for screening for autism in young children…
Descriptors: Check Lists, Autism, Interrater Reliability, Young Children
Brown, Gavin T. L.; Lake, Robert; Matters, Gabrielle – Australian Journal of Educational & Developmental Psychology, 2008
Background: Two major conceptions of learning exist: reproducing new material and transforming material to make meaning. Teachers' understandings of what learning is probably influence their teaching practices and student academic performance. Aims: To validate a short scale derived from Tait, Entwistle, & McCune's (1998) ASSIST inventory and…
Descriptors: Rating Scales, Factor Analysis, Foreign Countries, Psychometrics
Millar, Dorothy Squatrito – Education and Training in Developmental Disabilities, 2009
IEP transition-related content was compared between young adults with developmental disabilities who had or did not have legal guardians. It was found that students with guardians were more likely to earn a certificate of completion, and wanted to remain living with their families, in comparison to students without guardians who were more likely…
Descriptors: Developmental Disabilities, Young Adults, Individualized Education Programs, Self Determination
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Direct link
