NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 10 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Wesolowski, Brian C.; Wind, Stefanie A. – Journal of Educational Measurement, 2019
Rater-mediated assessments are a common methodology for measuring persons, investigating rater behavior, and/or defining latent constructs. The purpose of this article is to provide a pedagogical framework for examining rater variability in the context of rater-mediated assessments using three distinct models. The first model is the observation…
Descriptors: Interrater Reliability, Models, Observation, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Cropley, David H.; Kaufman, James C. – Journal of Creative Behavior, 2012
The Creative Solution Diagnosis Scale (CSDS) is a 30-item scale based on a core of four criteria: Relevance & Effectiveness, Novelty, Elegance, and Genesis. The CSDS offers potential for the consensual assessment of functional product creativity. This article describes an empirical study in which non-expert judges rated a series of mousetrap…
Descriptors: Expertise, Creativity, Identification, Measures (Individuals)
Goe, Laura; Holdheide, Lynn; Miller, Tricia – National Comprehensive Center for Teacher Quality, 2011
Across the nation, states and districts are in the process of building better teacher evaluation systems that not only identify highly effective teachers but also systematically provide data and feedback that can be used to improve teacher practice. "A Practical Guide to Designing Comprehensive Teacher Evaluation Systems" is a tool…
Descriptors: Feedback (Response), Teacher Effectiveness, Evaluators, Teacher Evaluation
Peer reviewed Peer reviewed
Schuster, Christof – Journal of Educational and Behavioral Statistics, 2001
If two raters assign targets to categories, the ratings can be arranged in a two-dimensional contingency table. This article presents a model for the frequencies in such a contingency table for which Cohen's kappa is a parameter. Illustrates the model using data from a study of the psychobiology of depression. (Author/SLD)
Descriptors: Depression (Psychology), Interrater Reliability, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Schumacker, Randall E.; Smith, Everett V., Jr. – Educational and Psychological Measurement, 2007
Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Descriptors: Measurement Techniques, Error of Measurement, Item Sampling, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Mariano, Louis T.; Junker, Brian W. – Journal of Educational and Behavioral Statistics, 2007
When constructed response test items are scored by more than one rater, the repeated ratings allow for the consideration of individual rater bias and variability in estimating student proficiency. Several hierarchical models based on item response theory have been introduced to model such effects. In this article, the authors demonstrate how these…
Descriptors: Test Items, Item Response Theory, Rating Scales, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Aleong, Chandra – Journal of College Teaching & Learning, 2007
This paper discusses whether there are differences in performance based on differences in strategy. First, an attempt was made to determine whether the institution had a strategy, and if so, did it follow a particular model. Major models of strategy are the industry analysis approach, the resource based view or the RBV model and the more recent,…
Descriptors: Strategic Planning, Higher Education, Institutional Evaluation, Case Studies
Peer reviewed Peer reviewed
Longford, N. T. – Journal of Educational and Behavioral Statistics, 1994
Presents a model-based approach to rater reliability for essays read by multiple raters. The approach is motivated by generalizability theory, and variation of rater severity and rater inconsistency is considered in the presence of between-examinee variations. Illustrates methods with data from standardized educational tests. (Author/SLD)
Descriptors: Educational Testing, Essay Tests, Generalizability Theory, Interrater Reliability
Peer reviewed Peer reviewed
Edwards, Alison L. – Modern Language Journal, 1996
Examined the validity of the pragmatic approach to test difficulty put forward by Child (1987). This study investigated whether the Child discourse-type hierarchy predicts text difficulty for second-language readers. Results suggested that this hierarchy may provide a sound basis for developing foreign-language tests when it is applied by trained…
Descriptors: Adult Students, Analysis of Variance, French, Interrater Reliability
Novak, Carl D. – 1985
The evaluation team of the Lincoln Public Schools (Nebraska) used the multi-attribution utility technology (MAUT) approach to prioritize potential evaluation projects. The priorities were used to allocate resources to the district's most important projects, and to eliminate or scale down less important projects. The problem was caused initially…
Descriptors: Elementary Secondary Education, Evaluation Criteria, Evaluation Methods, Evaluation Needs