Publication Date
| In 2026 | 0 |
| Since 2025 | 13 |
| Since 2022 (last 5 years) | 48 |
| Since 2017 (last 10 years) | 151 |
| Since 2007 (last 20 years) | 301 |
Descriptor
| Interrater Reliability | 503 |
| Test Reliability | 503 |
| Test Validity | 260 |
| Test Construction | 106 |
| Foreign Countries | 103 |
| Psychometrics | 91 |
| Evaluation Methods | 90 |
| Scores | 67 |
| Correlation | 62 |
| Scoring | 61 |
| Rating Scales | 58 |
| More ▼ | |
Source
Author
| Epstein, Michael H. | 7 |
| Johnson, Evelyn S. | 4 |
| Matson, Johnny L. | 4 |
| Tasse, Marc J. | 4 |
| Aman, Michael G. | 3 |
| Canivez, Gary L. | 3 |
| Capie, William | 3 |
| Conroy, Maureen A. | 3 |
| Crawford, Angela R. | 3 |
| Lecavalier, Luc | 3 |
| McLeod, Bryce D. | 3 |
| More ▼ | |
Publication Type
Education Level
Audience
| Researchers | 41 |
| Practitioners | 8 |
| Administrators | 3 |
| Teachers | 3 |
| Counselors | 1 |
Location
| Turkey | 11 |
| Canada | 10 |
| Australia | 9 |
| United Kingdom | 9 |
| Pennsylvania | 7 |
| Florida | 6 |
| Netherlands | 6 |
| Sweden | 5 |
| United Kingdom (England) | 5 |
| China | 4 |
| Illinois | 4 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 2 |
| No Child Left Behind Act 2001 | 1 |
| Pell Grant Program | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Peer reviewedHogan, Andrew – Evaluation and the Health Professions, 1986
This study derives the economic costs of misclassification in nursing home patient classification systems. These costs are then used as weights to estimate the reliability of a functional assessment instrument. Results suggest that reliability must be redefined and remeasured with each substantively new application of an assessment instrument.…
Descriptors: Classification, Correlation, Cost Effectiveness, Diagnostic Tests
Peer reviewedEpstein, Michael H.; Nieminen, Gayla S. – School Psychology Review, 1983
Teachers and classroom aides of learning disabled pupils were asked to complete the Conners Abbreviated Teacher Rating Scale (CATRS) on two separate occasions, one month apart. Inter-rater reliability for teachers (.866) and for aides (.602), and reliability across time for teachers (.866) and aides (.603) achieved acceptable levels. (Author/BW)
Descriptors: Elementary Education, Elementary School Teachers, Hyperactivity, Interrater Reliability
Peer reviewedVrancic, Daniela; Nanclares, Valeria; Soares, Delfina; Kulesz, Analia; Mordzinski, Claudia; Plebst, Christian; Starkstein, Sergio – Journal of Autism and Developmental Disorders, 2002
A study involving 30 Argentineans with autism evaluated the validity of the Autism Diagnostic Inventory-Telephone Screening in Spanish (ADI-TSS). The final version of the ADI-TSS could be assessed in 20 to 40 minutes and demonstrated a high validity, high interrater reliability, and high internal consistency. (Contains references.) (Author/CR)
Descriptors: Adults, Autism, Disability Identification, Foreign Countries
Peer reviewedAnderson, Stephen A. – Michigan Reading Journal, 2002
Considers the development of an inter-rater reliability correlation comparing the judgments, or scores, or each judge to see if their observations are similar. Presents a case study of the Northville Public Schools' data for the 2000 MEAP (Michigan Educational Assessment Program) Writing Test. Concludes that in this case study the state fails both…
Descriptors: Case Studies, Elementary Education, Evaluation Research, Interrater Reliability
Peer reviewedLewis, Kerry E. – American Journal of Speech-Language Pathology, 1995
An examination of the extent to which scores on the Stuttering Severity Instrument (SSI) for Children and Adults, Third Edition, accurately reflect 10 judges' observations of stuttering behaviors found that SSI scores obscured the wide range of judges' raw counts and did not accurately reflect the observational data from which they were derived.…
Descriptors: Adults, Children, Evaluation Methods, Interrater Reliability
Peer reviewedSimpson, Robert G. – Behavioral Disorders, 1991
The behavior of each of 120 students in grades 9-12 was rated by 2 of the student's teachers using the Revised Behavior Problem Checklist. Results indicated a generally low to moderate degree of relationship among teacher ratings. It is recommended that clinicians collect behavioral ratings from many raters before reaching diagnostic conclusions.…
Descriptors: Behavior Problems, Check Lists, Clinical Diagnosis, Interrater Reliability
Mabry, Linda – Phi Delta Kappan, 1999
Education remains heavily shackled by punitive, test-driven reform. Despite reasonable alternatives, testing increasingly drives educational accountability and reform. Standardization of direct writing assessments promotes scoring reliability and facilitates educational comparisons and rankings. However, standardized writing is not good writing,…
Descriptors: Elementary Secondary Education, Interrater Reliability, Performance Based Assessment, Scoring Rubrics
Peer reviewedNordin, Viviann; Gillberg, Christopher; Nyden, Agneta – Journal of Autism and Developmental Disorders, 1998
This study assessed the interrater reliability of a Swedish version of the Childhood Autism Rating Scale (CARS), an instrument for screening and diagnosis of autism. The CARS was used for rating autistic behavior by two investigators in 25 children. Results indicated fair to excellent agreement. Aspects of validity and reliability are discussed.…
Descriptors: Autism, Behavior Rating Scales, Clinical Diagnosis, Disability Identification
Matson, Johnny L.; Laud, Rinita B.; Gonzalez, Melissa L.; Malone, Carrie J.; Swender, Stephen L. – Research in Developmental Disabilities: A Multidisciplinary Journal, 2005
The use of anti-epileptic medications (AEDs) is much higher in individuals with intellectual disabilities than in the general population. As many of these individuals rely on such medications, clinicians should consider psychometrically sound instruments for assessing adverse side effects of these medications as one aspect of routine clinical…
Descriptors: Evaluation Methods, Seizures, Epilepsy, Developmental Disabilities
Assessing the Evidence: Different Types of NVQ Evidence and Their Impact on Reliability and Fairness
Greatorex, Jackie – Journal of Vocational Education and Training, 2005
The research literature reveals that there are many factors that influence the consistency of assessors' or examiners' judgements. One issue that has not been considered is whether National Vocational Qualifications assessors' consistency of judgement is affected by different types of evidence. In this article, 15 Customer Service and 12 Assessor…
Descriptors: Qualifications, Examiners, Interrater Reliability, Job Applicants
Ford, Julian D.; Trestman, Robert L.; Wiesbrock, Valerie; Zhang, Wanli – Assessment, 2007
The authors report the development and initial psychometric evaluation of gender-specific brief screening instruments to identify undetected psychiatric impairment on incarceration. Women and men completed the Correctional Mental Health Screen (CMHS), a 56-item screen derived from validated measures. Representative subsamples completed structured…
Descriptors: Psychometrics, Mental Health, Test Validity, Mental Disorders
Paden, Patricia A. – 1986
Two factors which may affect the ratings assigned to an essay test are investigated: (1) context effects; and (2) score level effects. Context effects exist in essay scoring if an essay is rated higher when preceded by poor quality essays than when preceded by high quality essays. A score level effect is defined as a change in the score (value)…
Descriptors: Context Effect, Essay Tests, Holistic Evaluation, Interrater Reliability
Lehmann, Rainer H. – 1987
A total of 1,487 eleventh grade students from the Hamburg (West Germany) school system were asked to complete four writing assignments used in an International Association for the Evaluation of Educational Achievement (IEA) study of writing assessment. In analyzing the writing samples, the study focused on: (1) between-rater effects; (2)…
Descriptors: Evaluation Problems, Foreign Countries, High Schools, International Programs
Aydin, Selami – Online Submission, 2006
This research aimed to investigate the effect of computers on the test and inter-rater reliability of writing test scores of ESL learners. Writing samples of 20 pen-paper and 20 computer group students were scored in analytic scoring method by two scorers, and then the scores were analyzed in Alpha (Cronbach) model. The results showed that the…
Descriptors: Writing Tests, Interrater Reliability, Test Reliability, English (Second Language)
Crehan, Kevin D. – 1997
Writing fits well within the realm of outcomes suitable for observation by performance assessments. Studies of the reliability of performance assessments have suggested that interrater reliability can be consistently high. Scoring consistency, however, is only one aspect of quality in decisions based on assessment results. Another is…
Descriptors: Evaluation Methods, Feedback, Generalizability Theory, Interrater Reliability

Direct link
