ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	11

Descriptor

Comparative Analysis	12
Interrater Reliability	12
Simulation	12
Correlation	5
Evaluation Methods	3
Evaluators	3
Measurement	3
Peer Evaluation	3
Scores	3
Scoring	3
Statistical Analysis	3
Teaching Methods	3
Accuracy	2
Computer Software	2
Essays	2
Evaluation Criteria	2
Models	2
Multiple Choice Tests	2
Probability	2
Recall (Psychology)	2
Reliability	2
Student Evaluation	2
Test Items	2
Video Technology	2
Writing Evaluation	2
More ▼

Source

ETS Research Report Series	2
ProQuest LLC	2
Assessment & Evaluation in…	1
Educational Measurement:…	1
IEEE Transactions on Learning…	1
Journal of Learning Analytics	1
Journal of Teaching in Social…	1
Physical Review Special…	1
Practical Assessment,…	1
Psicologica: International…	1

Publication Type

Journal Articles	10
Reports - Research	7
Dissertations/Theses -…	2
Reports - Evaluative	2
Reports - Descriptive	1
Tests/Questionnaires	1

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Washington

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)

What Works Clearinghouse Rating

Showing all 12 results Save | Export

A Model-Data-Fit-Informed Approach to Score Resolution in Performance Assessments

Peer reviewed

Direct link

Wind, Stefanie A.; Walker, A. Adrienne – Educational Measurement: Issues and Practice, 2021

Many large-scale performance assessments include score resolution procedures for resolving discrepancies in rater judgments. The goal of score resolution is conceptually similar to person fit analyses: To identify students for whom observed scores may not accurately reflect their achievement. Previously, researchers have observed that…

Descriptors: Goodness of Fit, Performance Based Assessment, Evaluators, Decision Making

Metrics for Discrete Student Models: Chance Levels, Comparisons, and Use Cases

Peer reviewed
PDF on ERIC

Download full text

Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018

Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…

Descriptors: Models, Comparative Analysis, Prediction, Probability

Using Clinical Simulation to Assess MSW Students' Engagement Skills

Peer reviewed

Direct link

Sacristan, Dolly; Martinez, Colleen D. – Journal of Teaching in Social Work, 2023

Social work educators are compelled to use reliable and valid methods to assess student learning outcomes. This study adapted a clinical simulation by integrating traditional role-play of case scenarios and elements of the Objective Structured Clinical Examination, which is often used to assess students' practice skills. Master of Social Work…

Descriptors: Graduate Students, Counselor Training, Masters Programs, Clinical Experience

Exploring Differences in Measurement and Reporting of Classroom Observation Inter-Rater Reliability

Peer reviewed
PDF on ERIC

Download full text

Wilhelm, Anne Garrison; Gillespie Rouse, Amy; Jones, Francesca – Practical Assessment, Research & Evaluation, 2018

Although inter-rater reliability is an important aspect of using observational instruments, it has received little theoretical attention. In this article, we offer some guidance for practitioners and consumers of classroom observations so that they can make decisions about inter-rater reliability, both for study design and in the reporting of data…

Descriptors: Interrater Reliability, Measurement, Observation, Educational Research

The Impact of Rater Variability on Relationships among Different Effect-Size Indices for Inter-Rater Agreement between Human and Automated Essay Scoring

Direct link

Yun, Jiyeo – ProQuest LLC, 2017

Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…

Descriptors: Interrater Reliability, Essays, Scoring, Evaluators

Item Response Theory for Peer Assessment

Peer reviewed

Direct link

Uto, Masaki; Ueno, Maomi – IEEE Transactions on Learning Technologies, 2016

As an assessment method based on a constructivist approach, peer assessment has become popular in recent years. However, in peer assessment, a problem remains that reliability depends on the rater characteristics. For this reason, some item response models that incorporate rater parameters have been proposed. Those models are expected to improve…

Descriptors: Item Response Theory, Peer Evaluation, Bayesian Statistics, Simulation

Estimating Item Difficulty with Comparative Judgments. Research Report. ETS RR-14-39

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal; Saldivia, Luis; Jackson, Carol; Schuppan, Fred; Wanamaker, Wilbur – ETS Research Report Series, 2014

Previous investigations of the ability of content experts and test developers to estimate item difficulty have, for themost part, produced disappointing results. These investigations were based on a noncomparative method of independently rating the difficulty of items. In this article, we argue that, by eliciting comparative judgments of…

Descriptors: Test Items, Difficulty Level, Comparative Analysis, College Entrance Examinations

Stimulated Recall Interviews for Describing Pragmatic Epistemology

Peer reviewed

Direct link

Shubert, Christopher W.; Meredith, Dawn C. – Physical Review Special Topics - Physics Education Research, 2015

Students' epistemologies affect how and what they learn: do they believe physics is a list of equations, or a coherent and sensible description of the physical world? In order to study these epistemologies as part of curricular assessment, we adopt the resources framework, which posits that students have many productive epistemological resources…

Descriptors: Epistemology, Recall (Psychology), Physics, Educational Environment

A Simulation Study of Rater Agreement Measures with 2x2 Contingency Tables

Peer reviewed
PDF on ERIC

Download full text

Ato, Manuel; Lopez, Juan Jose; Benavente, Ana – Psicologica: International Journal of Methodology and Experimental Psychology, 2011

A comparison between six rater agreement measures obtained using three different approaches was achieved by means of a simulation study. Rater coefficients suggested by Bennet's [sigma] (1954), Scott's [pi] (1955), Cohen's [kappa] (1960) and Gwet's [gamma] (2008) were selected to represent the classical, descriptive approach, [alpha] agreement…

Descriptors: Interrater Reliability, Measurement, Comparative Analysis, Statistical Analysis

Investigating the Suitability of Implementing the "e-rater"® Scoring Engine in a Large-Scale English Language Testing Program. Research Report. ETS RR-13-36

Peer reviewed
PDF on ERIC

Download full text

Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013

In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…

Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests

Assessing the Reliability of Simulation Evaluation Instruments Used in Nursing Education: A Test of Concept Study

Direct link

Adamson, Katie Anne – ProQuest LLC, 2011

Human patient simulation (HPS) provides experiential learning opportunities for student nurses and may be used as a supplement or alternative to traditional clinical education. The body of evidence supporting HPS as a teaching strategy is growing. However, challenges associated with measuring student learning and performance in HPS activities…

Descriptors: Video Technology, Nursing Students, Nursing Education, Medical Evaluation

Tutor versus Peer Group Assessment of Student Performance in a Simulation Training Exercise.

Peer reviewed

Kwan, Kam-por; Leung, Roberta – Assessment & Evaluation in Higher Education, 1996

Performance in a simulation exercise of 96 third-year college students studying the hotel and tourism industries was assessed separately by teacher and peers using an identical checklist. Although results showed some agreement between teacher and peers, when averaged marks were converted into grades, agreement occurred in under half the cases.…

Descriptors: Comparative Analysis, Evaluation Criteria, Evaluation Methods, Higher Education

Adamson, Katie Anne	1
Ato, Manuel	1
Attali, Yigal	1
Benavente, Ana	1
Bosch, Nigel	1
Breyer, F. Jay	1
Gillespie Rouse, Amy	1
Jackson, Carol	1
Jones, Francesca	1
Kwan, Kam-por	1
Leung, Roberta	1
Lopez, Juan Jose	1
Lorenz, Florian	1
Martinez, Colleen D.	1
Meredith, Dawn C.	1
Paquette, Luc	1
Sacristan, Dolly	1
Saldivia, Luis	1
Schuppan, Fred	1
Shubert, Christopher W.	1
Ueno, Maomi	1
Uto, Masaki	1
Walker, A. Adrienne	1
Wanamaker, Wilbur	1
Wilhelm, Anne Garrison	1
More ▼