Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Tenneij, Nienke H.; Koot, Hans M. – Journal of Applied Research in Intellectual Disabilities, 2007
Background: Achenbach & Rescorla (2003) recently developed the Adult Behavior Checklist (ABCL) to assess psychopathology in the general population. The ABCL should be completed by a proxy informant. The use of proxy informants, instead of self-reporting, makes the ABCL potentially suitable for the assessment of psychopathology in adults with…
Descriptors: Check Lists, Mental Retardation, Validity, Interrater Reliability
Lissitz, Robert W.; Samuelsen, Karen – Educational Researcher, 2007
This article presents the authors' response to the discussants of their article "A Suggested Change in Terminology and Emphasis Regarding Validity and Education" (this issue)--Susan E. Embretson, Joanna S. Gorin, Robert J. Mislevy, Pamela A. Moss, and Stephen G. Sireci. Their response is limited to a brief summarization of their position…
Descriptors: Validity, Reader Response, Construct Validity, Content Validity
Nock, Matthew K.; Holmberg, Elizabeth B.; Photos, Valerie I.; Michel, Bethany D. – Psychological Assessment, 2007
The authors developed the Self-Injurious Thoughts and Behaviors Interview (SITBI) and evaluated its psychometric properties. The SITBI is a structured interview that assesses the presence, frequency, and characteristics of a wide range of self-injurious thoughts and behaviors, including suicidal ideation, suicide plans, suicide gestures, suicide…
Descriptors: Psychometrics, Young Adults, Validity, Interrater Reliability
Wuttke, Eveline; Wolf, Karsten D. – European Journal of Vocational Training, 2007
Increasing people's ability to solve complex problems is more and more often being seen as an integral part of vocational education. While there have been numerous empirically-based approaches to the didactic structuring of teaching and learning arrangements by which students' ability to solve problems can be increased, knowledge of how to…
Descriptors: Problem Solving, Followup Studies, Vocational Education, Teaching Methods
Campbell, Peter – Phi Delta Kappan, 2007
In this rejoinder to John Chubb's reply to "Edison Is the Symptom, NCLB Is the Disease," the author argues that Edison offers feel-good measures without really solving any of the problem of schools in poverty. Defending his original argument, the author cites a RAND study that questions the results Chubb claims. The study indicates the…
Descriptors: Reader Response, Academic Achievement, Educationally Disadvantaged, Data Interpretation
Chou, Amy; Shih, Janet – Journal for Learning through the Arts, 2010
The main goal of this research study is to explore the interconnection between museum learning and theatre learning. We will begin this exploratory process by analyzing the functions of role-playing and improvisation as teaching and learning strategies, and we will then expand this analysis to the idea of storytelling as a link between learning in…
Descriptors: Museums, Theater Arts, Role Playing, Creative Activities
Reed, Deborah Kay – ProQuest LLC, 2010
This measurement study examined the construct validity of the retell component of the Texas Middle School Fluency Assessment (Texas Education Agency, University of Houston, & The University of Texas System, 2008a) within a confirmatory factor analysis framework. The role of retell, provided after a one-minute oral reading fluency measure, was…
Descriptors: Reading Fluency, Construct Validity, Interrater Reliability, Identification
Cecchetti, Alfred A. – ProQuest LLC, 2009
Objective: This dissertation developed an automatic classification procedure, as an example of a novel tool for an informationist, which extracts information from published abstracts, classifies abstracts into their "fields of study," and then determines the researcher's "field of study" and "level of activity." …
Descriptors: Medical Research, Medical Schools, Medicine, Classification
Daniel, Mark; Cargo, Margaret; Marks, Elisabeth; Paquet, Catherine; Simmons, David; Williams, Margaret; Rowley, Kevin; O'Dea, Kerin – Social Indicators Research, 2009
This study reports on the development and evaluation of a rating tool to assess the scientific utility and cultural appropriateness of community-level indicators for application with Indigenous populations. Indicator criteria proposed by the U.S. Institute of Medicine were culturally adapted through reviewing the literature and consultations with…
Descriptors: Research Design, Indigenous Populations, Public Health, Content Validity
Collishaw, Stephan; Goodman, Robert; Ford, Tamsin; Rabe-Hesketh, Sophia; Pickles, Andrew – Journal of Child Psychology and Psychiatry, 2009
Background: Assessments of child psychopathology commonly rely on multiple informants, e.g., parents, teachers and children. Informants often disagree about the presence or absence of symptoms, reflecting reporter bias, situation-specific behaviour, or random variation in measurement. However, few studies have systematically tested how far…
Descriptors: Psychopathology, Interrater Reliability, Children, Parents
Froman, Richard L., Jr. – 1988
The reliability of a taxonomy of humor was tested in two studies. The first study involved rater identification of nine categories for humorous incidents excerpted from television comedy programs (wordplay, exaggeration/understatement, contrast, audience knowledge, aggression, emotion, taboo, pratfall/slapstick, and repetition). The second study,…
Descriptors: Classification, Humor, Interrater Reliability, Psychometrics
Brown, R. L. – 1987
This paper explores the use of K. G. Joreskog's (1970) congeneric modeling approach to reliability using censored quantitative variables. Two Monte Carlo studies were conducted. The first explored the robustness of Normal Theory Generalized Least-Squares (NTGLS) estimates for a single-factor congeneric model across several sample sizes…
Descriptors: Interrater Reliability, Monte Carlo Methods, Sample Size
Peer reviewedWhitehurst, Grover, J. – American Psychologist, 1984
Holds that interrater agreement for journal manuscript reviews has seemed unacceptably low because it has been assessed using techniques such as the intraclass correlation, which compares error variance with the variance due to manuscripts. Describes and recommends an alternative approach for computing interrater agreement. (GC)
Descriptors: Interrater Reliability, Periodicals, Psychological Studies, Statistical Analysis
Peer reviewedCollis, Glyn M. – Educational and Psychological Measurement, 1985
Some suggestions for measuring marginal symmetry in agreement matrices for categorical data are discussed, together with measures of item-by-item agreement conditional on marginal asymmetry. Connections with intraclass correlations for dichotomous data are noted. (Author)
Descriptors: Correlation, Interrater Reliability, Item Analysis, Matrices
Peer reviewedLi, Mao-Neng Fred; Lautenschlager, Gary – Educational and Psychological Measurement, 1997
lllustrates a link between the multiple-rater kappa of J. Fleiss (1971) or other analogues and the generalizability (G) coefficient for a single facet design, and discusses the use and interpretation of G theory in the study of interrater agreement when data are measured on a nominal scale. (SLD)
Descriptors: Classification, Generalizability Theory, Interrater Reliability, Research Design

Direct link
