Publication Date
| In 2026 | 0 |
| Since 2025 | 58 |
| Since 2022 (last 5 years) | 284 |
| Since 2017 (last 10 years) | 780 |
| Since 2007 (last 20 years) | 2042 |
Descriptor
| Interrater Reliability | 3124 |
| Foreign Countries | 655 |
| Test Reliability | 503 |
| Evaluation Methods | 502 |
| Test Validity | 410 |
| Correlation | 401 |
| Scoring | 347 |
| Comparative Analysis | 327 |
| Scores | 324 |
| Validity | 310 |
| Student Evaluation | 308 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 130 |
| Practitioners | 42 |
| Teachers | 22 |
| Administrators | 11 |
| Counselors | 3 |
| Policymakers | 2 |
Location
| Australia | 56 |
| Turkey | 53 |
| United Kingdom | 46 |
| Canada | 45 |
| Netherlands | 40 |
| China | 38 |
| California | 37 |
| United States | 30 |
| United Kingdom (England) | 25 |
| Taiwan | 23 |
| Germany | 22 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 3 |
| Meets WWC Standards with or without Reservations | 3 |
| Does not meet standards | 3 |
Lane, Kathleen Lynne; Kalberg, Jemma Robertson; Parks, Robin J.; Carter, Erik W. – Journal of Emotional and Behavioral Disorders, 2008
This article presents findings from a study of the reliability and validity of the "Student Risk Screening Scale" for use with high school students (N = 674). Results revealed high internal consistency, test-retest stability, interrater reliability, and convergent validity with the "Strengths and Difficulties Questionnaire". Predictive validity…
Descriptors: Grade Point Average, Antisocial Behavior, Predictive Validity, Behavior Disorders
Holbrook, Allyson; Bourke, Sid; Lovat, Terry; Fairbairn, Hedy – Australian Journal of Education, 2008
This is a mixed methods investigation of consistency in PhD examination. At its core is the quantification of the content and conceptual analysis of examiner reports for 804 Australian theses. First, the level of consistency between what examiners say in their reports and the recommendation they provide for a thesis is explored, followed by an…
Descriptors: Academic Standards, Examiners, Student Evaluation, Foreign Countries
Brennan, David J. – Higher Education Research and Development, 2008
This paper provides an overview of the issue of student anonymity in the summative assessment of student work in higher education. It considers both theoretical literature pertaining to bias in the evaluation of the work of others and the limited empirical work undertaken on this issue in higher education. It then describes the experience of three…
Descriptors: Higher Education, Student Evaluation, Interrater Reliability, Test Bias
Lannie, Amanda L.; Martens, Brian K. – Journal of Behavioral Education, 2008
Four fifth-grade students were presented with frustration-level math probes while three performance dimensions were measured (i.e., percent intervals on-task, percent correct digits, and digits correct per minute (DCM)). Using a multiple baseline design across participants, students were trained to self-monitor time on-task, accuracy, and…
Descriptors: Intervals, Interrater Reliability, Rewards, Grade 5
Hitt, Austin M.; Helms, Emory C. – Professional Educator, 2009
This paper discusses an instructional approach designed to help preservice teachers understand how assessments can be influenced by personal biases. In order to achieve this objective, we developed an analogy-based activity called "The Dog Show Analogy." After participating in the activity, we have observed that the participating preservice…
Descriptors: Preservice Teachers, Student Evaluation, Teacher Education Programs, Experimenter Characteristics
Rossiter, Marian J. – Canadian Modern Language Review, 2009
This article explores perceptions of the speaking fluency of 24 adult ESL learners (11 men, 13 women) who narrated picture stories at Time 1 and again 10 weeks later at Time 2. One-minute excerpts from each rendition were randomized and played to 15 novice and six expert native speakers of English (undergraduate education students and experienced…
Descriptors: Native Speakers, English (Second Language), Adult Students, Student Attitudes
New York State Education Department, 2014
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. The purpose of this report is to document the technical aspects of the 2013-14 NYSAA.…
Descriptors: Alternative Assessment, Educational Assessment, State Departments of Education, Student Evaluation
Peer reviewedRae, Gordon – Educational and Psychological Measurement, 1988
Using the Gini-Light-Margolin concept of "partioning" variance for qualitative data, correspondences are established between various kappa statistics and intraclass correlation coefficients under general conditions (multiple raters and polychotomous category systems). A measure of marginal symmetry for multiple ratings. (Author/TJH)
Descriptors: Correlation, Interrater Reliability, Qualitative Research
Peer reviewedSchuster, Christof – Journal of Educational and Behavioral Statistics, 2001
If two raters assign targets to categories, the ratings can be arranged in a two-dimensional contingency table. This article presents a model for the frequencies in such a contingency table for which Cohen's kappa is a parameter. Illustrates the model using data from a study of the psychobiology of depression. (Author/SLD)
Descriptors: Depression (Psychology), Interrater Reliability, Models
Peer reviewedLunz, Mary E. – Popular Measurement, 1999
Describes a study of judge leniency and consistency using a Rasch approach and involving 4,683 candidates and 53 judges. (SLD)
Descriptors: Interrater Reliability, Judges, Longitudinal Studies
Peer reviewedJanson, Harald; Olsson, Ulf – Educational and Psychological Measurement, 2001
Proposes a generalization of Cohen's kappa coefficient (J. Cohen, 1960) to address the problem of accounting for overall chance-corrected interobserver agreement among the multivariate ratings of several judges. The statistic's metric is conventional and in the univariate case it is equivalent to existing extensions of the kappa coefficient to…
Descriptors: Interrater Reliability, Judges, Multivariate Analysis
Jeffery, Jill V. – ProQuest LLC, 2010
"Voice" is widely considered to be a feature of effective writing. It's no surprise, then, that voice criteria frequently appear on rubrics used to score student essays in large-scale writing assessments. However, composition theorists hold vastly different views regarding voice and how it should be applied in the evaluation of student writing, if…
Descriptors: Expository Writing, Evaluators, Writing Evaluation, Writing Tests
Griffin, Merilee; Falberg, Amy; Krygier, Gigi – Teaching English in the Two-Year College, 2010
The 2006 Spellings Commission report, "A Test of Leadership," stated that substandard high school preparation is compounded by poor alignment between high schools and colleges, which often creates an "expectations gap" between what colleges require and what high schools produce. This gap results in an annual expenditure of roughly one billion…
Descriptors: High Schools, Writing Teachers, Secondary School Teachers, At Risk Students
Carnahan, Christi; Basham, James; Musti-Rao, Shobana – Exceptionality, 2009
Active engagement is critical to promote learning for students with autism. Although evidence-based strategies exist for promoting engagement for individual students with autism, there are few strategies designed for use with small groups. This study used an ABCAC design to assess the effects of a low-technology use strategy, namely interactive…
Descriptors: Small Group Instruction, Autism, Educational Strategies, Learner Engagement
Epstein, Michael H.; Synhorst, Lori – Journal of Child and Family Studies, 2008
The Preschool Behavioral and Emotional Rating Scale (PreBERS) is a standardized, norm-referenced instrument that assesses the emotional and behavioral strengths of preschool children. Two studies that investigated the test-retest and inter-rater reliability of the PreBERS are reported. In the first study, teachers rated preschool children (N = 96)…
Descriptors: Interrater Reliability, Preschool Children, Behavior Rating Scales, Measures (Individuals)

Direct link
