Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Gadesmann, Miriam; Miller, Nick – International Journal of Language & Communication Disorders, 2008
Background: Measures of articulatory diadochokinesis (DDK) are widely used in the assessment of motor speech disorders and they play a role in detecting abnormality, monitoring speech performance changes and classifying syndromes. Although in clinical practice DDK is generally measured perceptually, without support from instrumental methods that…
Descriptors: Speech Impairments, Audio Equipment, Clinical Experience, Language Impairments
Matson, Johnny L.; Gonzalez, Melissa L.; Wilkins, Jonathan; Rivet, Tessa T. – Research in Autism Spectrum Disorders, 2008
The reliability of a new scale to assess Autistic Disorder, Pervasive Developmental Disorder, Not Otherwise Specified (PDD-NOS), and Asperger's Disorder in children was examined. Parents or other caregivers rated symptoms of 207 children between 2 and 16 years of age. The scale, which had 40 items in the final version, correlated highly with…
Descriptors: Autism, Interrater Reliability, Criteria, Psychopathology
Solano-Flores, Guillermo; Li, Min – Assessment for Effective Intervention, 2008
The dependability of academic achievement measures for English language learners (ELLs) is influenced by three facts: (a) Each ELL has unique strengths and weaknesses in each language mode (listening, speaking, reading, and writing) both in English and in his or her first language, (b) each test item poses a different set of linguistic demands…
Descriptors: Generalizability Theory, Test Items, Dialects, Academic Achievement
Bunton, Kate; Kent, Raymond D.; Duffy, Joseph R.; Rosenbek, John C.; Kent, Jane F. – Journal of Speech, Language, and Hearing Research, 2007
Purpose: Darley, Aronson, and Brown (1969a, 1969b) detailed methods and results of auditory-perceptual assessment for speakers with dysarthrias of varying etiology. They reported adequate listener reliability for use of the rating system as a tool for differential diagnosis, but several more recent studies have raised concerns about listener…
Descriptors: Auditory Perception, Speech Impairments, Interrater Reliability, Measures (Individuals)
Lyneham, Heidi J.; Abbott, Maree J.; Rapee, Ronald M. – Journal of the American Academy of Child & Adolescent Psychiatry, 2007
Objective: The present study determined interrater agreement on diagnoses achieved using the parent and child versions of the Anxiety Disorders Interview Schedule for Children for DSM-IV (ADIS-C/P) and examined informant, age, and gender influences on reliability. Method: Diagnoses established for 153 seven- to 16-year-old children during live…
Descriptors: Parents, Anxiety, Interrater Reliability, Evaluation Methods
Peer reviewedBerry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1988
Cohen's kappa statistic is frequently used to measure agreement between two observers using categorical polytomies. Cohen's statistic is: shown to be inherently multivariate in nature; expanded to analyze ordinal and interval data; and extended to over two observers. A non-asymptotic test of significance is provided for the generalized statistic.…
Descriptors: Equations (Mathematics), Interrater Reliability, Multivariate Analysis
Peer reviewedLindell, Michael K. – Applied Psychological Measurement, 2001
Developed an index for assessing interrater agreement with respect to a single target using a multi-item rating scale. The variance of rater mean scale scores is used as the numerator of the agreement index. Studied four variants of a disattenuated agreement index that vary in the random response term used as the denominator. (SLD)
Descriptors: Evaluation Methods, Interrater Reliability, Rating Scales
Peer reviewedFan, Xitao; Chen, Michael – Educational and Psychological Measurement, 2000
Provides a sample of seven published studies in different disciplines that inappropriately generalized reliability coefficients involving several raters to scores generated by a single rater. Score reliability when only one rater is used for scoring is lower than the score reliability for which two raters are used. (SLD)
Descriptors: Interrater Reliability, Research Reports, Scores, Scoring
Huang, Chiungjung – Educational and Psychological Measurement, 2009
This study examined the percentage of task-sampling variability in performance assessment via a meta-analysis. In total, 50 studies containing 130 independent data sets were analyzed. Overall results indicate that the percentage of variance for (a) differential difficulty of task was roughly 12% and (b) examinee's differential performance of the…
Descriptors: Test Bias, Research Design, Performance Based Assessment, Performance Tests
Hay, Peter J.; Macdonald, Doune – Assessment in Education: Principles, Policy & Practice, 2008
This paper draws on semi-structured interview data and participant observations of senior secondary Physical Education (PE) teachers and students at two school sites across 20 weeks of the school year. The data indicated that the teachers in this study made progressive judgements about students' level of achievement across each unit of work…
Descriptors: Secondary School Teachers, Evaluative Thinking, Physical Education, Secondary School Students
Petscher, Erin Seligson; Bailey, Jon S. – Behavior Modification, 2008
This study evaluated the effects and collateral effects of extinction (EXT) and differential reinforcement of alternative behavior (DRA) interventions with inappropriate vocalizations and work refusal. Both interventions have been used frequently to reduce problem behaviors. The benefits of these interventions have been established yet may be…
Descriptors: Behavior Modification, Reinforcement, Intervention, Behavior Problems
Rodriguez-Campos, Liliana; Rincones-Gomez, Rigoberto; Shen, Jianping – Frontiers of Education in China, 2008
Structural Equation Modeling (SEM) was used in this study to determine the extent to which teachers, principals, and superintendents perceive the leadership construct in the same way. The researchers found that the two-factor model fits the principal group and particularly the superintendent group better than does the four-factor model. The…
Descriptors: Structural Equation Models, Superintendents, Principals, Teacher Attitudes
Eriks-Brophy, Alice; Quittenbaum, Jacqueline; Anderson, Deborah; Nelson, Tina – Clinical Linguistics & Phonetics, 2008
The current article describes the results, inter-scorer reliability, and potential sources of bias in conducting speech-language assessments with Aboriginal children in remote Ontario communities using videoconferencing. A main focus of this pilot study was to examine scoring bias, an issue that might arise with videoconferencing for any…
Descriptors: Articulation (Speech), Indigenous Populations, Children, Language Tests
Black, Beth; Bramley, Tom – Research Papers in Education, 2008
A new judgemental method of equating raw scores on two tests, based on rank-ordering scripts from both tests, has been developed by Bramley. The rank-ordering method has potential application as a judgemental standard-maintaining mechanism, because given a mark on one test (e.g. the A grade boundary mark), the equivalent mark (i.e. at the same…
Descriptors: Foreign Countries, Equated Scores, Test Theory, Evaluative Thinking
Tennessee Department of Education, 2012
In the summer of 2011, the Tennessee Department of Education contracted with the National Institute for Excellence in Teaching (NIET) to provide a four-day training for all evaluators across the state. NIET trained more than 5,000 evaluators intensively in the state model (districts using alternative instruments delivered their own training).…
Descriptors: Video Technology, Feedback (Response), Evaluators, Interrater Reliability

Direct link
