NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers1
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 19 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Fromm, Davida; Katta, Saketh; Paccione, Mason; Hecht, Sophia; Greenhouse, Joel; MacWhinney, Brian; Schnur, Tatiana T. – Journal of Speech, Language, and Hearing Research, 2021
Purpose: Analysis of connected speech in the field of adult neurogenic communication disorders is essential for research and clinical purposes, yet time and expertise are often cited as limiting factors. The purpose of this project was to create and evaluate an automated program to score and compute the measures from the Quantitative Production…
Descriptors: Speech, Automation, Statistical Analysis, Adults
Peer reviewed Peer reviewed
Direct linkDirect link
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Cohen, Yoav; Levi, Effi; Ben-Simon, Anat – Applied Measurement in Education, 2018
In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay's true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By…
Descriptors: Test Validity, Automation, Scoring, Computer Assisted Testing
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Rios, Joseph A.; Sparks, Jesse R.; Zhang, Mo; Liu, Ou Lydia – ETS Research Report Series, 2017
Proficiency with written communication (WC) is critical for success in college and careers. As a result, institutions face a growing challenge to accurately evaluate their students' writing skills to obtain data that can support demands of accreditation, accountability, or curricular improvement. Many current standardized measures, however, lack…
Descriptors: Test Construction, Test Validity, Writing Tests, College Outcomes Assessment
Peer reviewed Peer reviewed
Direct linkDirect link
Prevost, Luanna B.; Smith, Michelle K.; Knight, Jennifer K. – CBE - Life Sciences Education, 2016
Previous work has shown that students have persistent difficulties in understanding how central dogma processes can be affected by a stop codon mutation. To explore these difficulties, we modified two multiple-choice questions from the Genetics Concept Assessment into three open-ended questions that asked students to write about how a stop codon…
Descriptors: Science Instruction, Genetics, Scientific Concepts, Scoring
Peer reviewed Peer reviewed
Direct linkDirect link
Midgette, Ekaterina; Haria, Priti – Reading Psychology, 2016
The purpose of the study was to investigate the effects of two comprehensive argumentative writing interventions--Text Structure Instruction (TSI) and Text Structure Revision Instruction (TSRI)--on the eighth-grade students' ability to compose convincing essays that include structural elements of argumentative discourse. Both treatment groups…
Descriptors: Persuasive Discourse, Essays, Text Structure, Writing Instruction
Peer reviewed Peer reviewed
Direct linkDirect link
Casey, Laura B.; Miller, Neal D.; Stockton, Michelle B.; Justice, William V. – Language Assessment Quarterly, 2016
Many students struggle with writing; however, curriculum-based measures (CBM) of writing often use assessment criteria that focus primarily on mechanics. When academic development is assessed in this way, more complex aspects of a student's writing, such as the expression and development of ideas, may be neglected. The current study was a…
Descriptors: Elementary School Students, Writing (Composition), Writing Evaluation, Curriculum Based Assessment
Alkahtani, Saif F. – ProQuest LLC, 2012
The principal aim of the present study was to better guide the Quranic recitation appraisal practice by presenting an application of Generalizability theory and Many-facet Rasch Measurement Model for assessing the dependability and fit of two suggested rubrics. Recitations of 93 students were rated holistically and analytically by 3 independent…
Descriptors: Generalizability Theory, Item Response Theory, Verbal Tests, Islam
Peer reviewed Peer reviewed
Direct linkDirect link
Haudek, Kevin C.; Prevost, Luanna B.; Moscarella, Rosa A.; Merrill, John; Urban-Lurain, Mark – CBE - Life Sciences Education, 2012
Students' writing can provide better insight into their thinking than can multiple-choice questions. However, resource constraints often prevent faculty from using writing assessments in large undergraduate science courses. We investigated the use of computer software to analyze student writing and to uncover student ideas about chemistry in an…
Descriptors: Chemistry, Biology, Introductory Courses, Science Instruction
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Wang, Ping – English Language Teaching, 2009
This paper makes a study of the rater reliability in scoring composition in the test of English as a foreign language (EFL) and focuses on the inter-rater reliability as well as several interactions between raters and the other facets involved (that is examinees, rating criteria and rating methods). Results showed that raters were fairly…
Descriptors: Interrater Reliability, Scoring, Writing (Composition), English (Second Language)
Burmester, Kristen O'Rourke – ProQuest LLC, 2011
Classrooms are a primary site of evidence about learning. Yet classroom proceedings often occur behind closed doors and hence evidence of student learning is observable only to the classroom teacher. The informal and undocumented nature of this information means that it is rarely included in statistical models or quantifiable analyses. This…
Descriptors: Evidence, Student Evaluation, Educational Research, Validity
Braun, Henry I.; Wainer, Howard – 1989
A desirable goal would be to develop a methodology for scoring essays so that the final grades are less affected by when or by whom each essay was read. It seems sensible to derive such grades by somehow adjusting the ratings originally given by each reader. This essay describes a solution that relies on statistical adjustment, using the context…
Descriptors: Essay Tests, Estimation (Mathematics), Interrater Reliability, Scoring
Peer reviewed Peer reviewed
Blackman, Nicole J-M.; Koval, John J. – Applied Psychological Measurement, 1993
Four indexes of agreement between ratings of a person that correct for chance and are interpretable as intraclass correlation coefficients for different analysis of variance models are investigated. Relationships among the estimators are established for finite samples, and the equivalence of these estimators in large samples is demonstrated. (SLD)
Descriptors: Analysis of Variance, Equations (Mathematics), Estimation (Mathematics), Interrater Reliability
Previous Page | Next Page »
Pages: 1  |  2