NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 24 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
West, Brady T.; Li, Dan – Sociological Methods & Research, 2019
In face-to-face surveys, interviewer observations are a cost-effective source of paradata for nonresponse adjustment of survey estimates and responsive survey designs. Unfortunately, recent studies have suggested that the accuracy of these observations can vary substantially among interviewers, even after controlling for household-, area-, and…
Descriptors: Observation, Interviews, Error of Measurement, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Chang, Briana L.; Cromley, Jennifer G.; Tran, Nhi – International Journal of Science and Mathematics Education, 2016
Coordination of multiple representations (CMR) is widely recognized as a critical skill in mathematics and is frequently demanded in reform calculus textbooks. However, little is known about the prevalence of coordination tasks in such textbooks. We coded 707 instances of CMR in a widely used reform calculus textbook and analyzed the distributions…
Descriptors: Calculus, Textbooks, Teaching Methods, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Gugiu, Mihaiela R.; Gugiu, Paul C.; Baldus, Robert – Journal of MultiDisciplinary Evaluation, 2012
Background: Educational researchers have long espoused the virtues of writing with regard to student cognitive skills. However, research on the reliability of the grades assigned to written papers reveals a high degree of contradiction, with some researchers concluding that the grades assigned are very reliable whereas others suggesting that they…
Descriptors: Grades (Scholastic), Grading, Scoring Rubrics, Research Design
Peer reviewed Peer reviewed
Direct linkDirect link
Isaacs, Talia; Thomson, Ron I. – Language Assessment Quarterly, 2013
This mixed-methods study examines the effects of rating scale length and rater experience on listeners' judgments of second-language (L2) speech. Twenty experienced and 20 novice raters, who were randomly assigned to 5-point or 9-point rating scale conditions, judged speech samples of 38 newcomers to Canada on numerical rating scales for…
Descriptors: Foreign Countries, Adults, Second Language Learning, English (Second Language)
Rui, Ning; Feldman, Jill M. – Online Submission, 2012
Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
Descriptors: Classroom Observation Techniques, Interrater Reliability, Correlation, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Gorard, Stephen – International Journal of Research & Method in Education, 2009
The author previously published a paper discussing how to conduct an analysis based on a cluster sample. In that paper, the author outlined several widely adopted alternative approaches, and pointed out that such approaches are anyway not needed for population figures, and not possible for non-probability samples. Thus, the author queried the…
Descriptors: Probability, Misconceptions, Reader Response, Research Methodology
Peer reviewed Peer reviewed
Direct linkDirect link
Alt, Mary; Meyers, Christina; Figueroa, Cecilia – Journal of Speech, Language, and Hearing Research, 2013
Purpose: The purpose of this study was to determine whether children exposed to 2 languages would benefit from the phonotactic probability cues of a single language in the same way as monolingual peers and to determine whether crosslinguistic influence would be present in a fast-mapping task. Method: Two groups of typically developing children…
Descriptors: Regression (Statistics), Spanish, Cues, Task Analysis
Peer reviewed Peer reviewed
Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
A FORTRAN subroutine is presented to calculate a generalized measure of agreement between multiple raters and a set of correct responses at any level of measurement and among multiple responses, along with the associated probability value, under the null hypothesis. (Author)
Descriptors: Computer Software, Interrater Reliability, Measurement Techniques, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Pfeiffer, Steven; Petscher, Yaacov; Kumtepe, Alper – Roeper Review, 2008
This study examined the internal consistency and validity of a new rating scale to identify gifted students, the Gifted Rating Scales-School Form (GRS-S). The study explored the effect of gender, race/ethnicity, age, and rater familiarity on GRS-S ratings. One hundred twenty-two students in first to eighth grade from elementary and middle schools…
Descriptors: Ethnicity, Middle Schools, Academically Gifted, Talent
van der Linden, Wim J.; Vos, Hans J.; Chang, Lei – 2000
In judgmental standard setting experiments, it may be difficult to specify subjective probabilities that adequately take the properties of the items into account. As a result, these probabilities are not consistent with each other in the sense that they do not refer to the same borderline level of performance. Methods to check standard setting…
Descriptors: Interrater Reliability, Judges, Probability, Standard Setting
Peer reviewed Peer reviewed
Towstopiat, Olga – Contemporary Educational Psychology, 1984
The present article reviews the procedures that have been developed for measuring the reliability of human observers' judgments when making direct observations of behavior. These include the percentage of agreement, Cohen's Kappa, phi, and univariate and multivariate agreement measures that are based on quasi-equiprobability and quasi-independence…
Descriptors: Interrater Reliability, Mathematical Models, Multivariate Analysis, Observation
Previous Page | Next Page ยป
Pages: 1  |  2