NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20260
Since 20250
Since 2022 (last 5 years)1
Since 2017 (last 10 years)4
Since 2007 (last 20 years)13
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Donoghue, John R.; McClellan, Catherine A.; Hess, Melinda R. – ETS Research Report Series, 2022
When constructed-response items are administered for a second time, it is necessary to evaluate whether the current Time B administration's raters have drifted from the scoring of the original administration at Time A. To study this, Time A papers are sampled and rescored by Time B scorers. Commonly the scores are compared using the proportion of…
Descriptors: Item Response Theory, Test Construction, Scoring, Testing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
West, Brady T.; Li, Dan – Sociological Methods & Research, 2019
In face-to-face surveys, interviewer observations are a cost-effective source of paradata for nonresponse adjustment of survey estimates and responsive survey designs. Unfortunately, recent studies have suggested that the accuracy of these observations can vary substantially among interviewers, even after controlling for household-, area-, and…
Descriptors: Observation, Interviews, Error of Measurement, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Conger, Anthony J. – Educational and Psychological Measurement, 2017
Drawing parallels to classical test theory, this article clarifies the difference between rater accuracy and reliability and demonstrates how category marginal frequencies affect rater agreement and Cohen's kappa. Category assignment paradigms are developed: comparing raters to a standard (index) versus comparing two raters to one another…
Descriptors: Interrater Reliability, Evaluators, Accuracy, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Chang, Briana L.; Cromley, Jennifer G.; Tran, Nhi – International Journal of Science and Mathematics Education, 2016
Coordination of multiple representations (CMR) is widely recognized as a critical skill in mathematics and is frequently demanded in reform calculus textbooks. However, little is known about the prevalence of coordination tasks in such textbooks. We coded 707 instances of CMR in a widely used reform calculus textbook and analyzed the distributions…
Descriptors: Calculus, Textbooks, Teaching Methods, Mathematics Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016
As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…
Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation
Peer reviewed Peer reviewed
Direct linkDirect link
Gugiu, Mihaiela R.; Gugiu, Paul C.; Baldus, Robert – Journal of MultiDisciplinary Evaluation, 2012
Background: Educational researchers have long espoused the virtues of writing with regard to student cognitive skills. However, research on the reliability of the grades assigned to written papers reveals a high degree of contradiction, with some researchers concluding that the grades assigned are very reliable whereas others suggesting that they…
Descriptors: Grades (Scholastic), Grading, Scoring Rubrics, Research Design
Peer reviewed Peer reviewed
Direct linkDirect link
Isaacs, Talia; Thomson, Ron I. – Language Assessment Quarterly, 2013
This mixed-methods study examines the effects of rating scale length and rater experience on listeners' judgments of second-language (L2) speech. Twenty experienced and 20 novice raters, who were randomly assigned to 5-point or 9-point rating scale conditions, judged speech samples of 38 newcomers to Canada on numerical rating scales for…
Descriptors: Foreign Countries, Adults, Second Language Learning, English (Second Language)
Rui, Ning; Feldman, Jill M. – Online Submission, 2012
Notwithstanding broad utility of COPs (classroom observation protocols), there has been limited documentation of the psychometric properties of even the most popular COPs. This study attempted to fill this void by closely examining the item and domain-level IRR (inter-rater reliability) of a COP that was used in a federally funded striving readers…
Descriptors: Classroom Observation Techniques, Interrater Reliability, Correlation, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Gorard, Stephen – International Journal of Research & Method in Education, 2009
The author previously published a paper discussing how to conduct an analysis based on a cluster sample. In that paper, the author outlined several widely adopted alternative approaches, and pointed out that such approaches are anyway not needed for population figures, and not possible for non-probability samples. Thus, the author queried the…
Descriptors: Probability, Misconceptions, Reader Response, Research Methodology
Peer reviewed Peer reviewed
Direct linkDirect link
Alt, Mary; Meyers, Christina; Figueroa, Cecilia – Journal of Speech, Language, and Hearing Research, 2013
Purpose: The purpose of this study was to determine whether children exposed to 2 languages would benefit from the phonotactic probability cues of a single language in the same way as monolingual peers and to determine whether crosslinguistic influence would be present in a fast-mapping task. Method: Two groups of typically developing children…
Descriptors: Regression (Statistics), Spanish, Cues, Task Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Pfeiffer, Steven; Petscher, Yaacov; Kumtepe, Alper – Roeper Review, 2008
This study examined the internal consistency and validity of a new rating scale to identify gifted students, the Gifted Rating Scales-School Form (GRS-S). The study explored the effect of gender, race/ethnicity, age, and rater familiarity on GRS-S ratings. One hundred twenty-two students in first to eighth grade from elementary and middle schools…
Descriptors: Ethnicity, Middle Schools, Academically Gifted, Talent
International Association for Development of the Information Society, 2012
The IADIS CELDA 2012 Conference intention was to address the main issues concerned with evolving learning processes and supporting pedagogies and applications in the digital age. There had been advances in both cognitive psychology and computing that have affected the educational arena. The convergence of these two disciplines is increasing at a…
Descriptors: Academic Achievement, Academic Persistence, Academic Support Services, Access to Computers