NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 515 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023
Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…
Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment
Peer reviewed Peer reviewed
Direct linkDirect link
Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023
In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…
Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training
Peer reviewed Peer reviewed
Direct linkDirect link
Bonett, Douglas G. – Journal of Educational and Behavioral Statistics, 2022
The limitations of Cohen's ? are reviewed and an alternative G-index is recommended for assessing nominal-scale agreement. Maximum likelihood estimates, standard errors, and confidence intervals for a two-rater G-index are derived for one-group and two-group designs. A new G-index of agreement for multirater designs is proposed. Statistical…
Descriptors: Statistical Inference, Statistical Data, Interrater Reliability, Design
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Burkhardt, Amy; Lottridge, Susan; Woolf, Sherri – Educational Measurement: Issues and Practice, 2021
For some students, standardized tests serve as a conduit to disclose sensitive issues of harm or distress that may otherwise go unreported. By detecting this writing, known as "crisis papers," testing programs have a unique opportunity to assist in mitigating the risk of harm to these students. The use of machine learning to…
Descriptors: Scoring Rubrics, Identification, At Risk Students, Standardized Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Silva, Thanuci; Santos, Regiane dos; Mallet, Débora – Journal of Education for Business, 2023
Assuring the quality of education is a concern of learning institutions. To do so, it is necessary to have assertive learning management, with consistent data on students' outcomes. This research provides associate deans and researchers, a roadmap with which to gather evidence to improve the quality of open-ended assessments. Based on statistical…
Descriptors: Student Evaluation, Evaluation Methods, Business Education, Higher Education
Peer reviewed Peer reviewed
Direct linkDirect link
Ole J. Kemi – Advances in Physiology Education, 2025
Students are assessed by coursework and/or exams, all of which are marked by assessors (markers). Student and marker performances are then subject to end-of-session board of examiner handling and analysis. This occurs annually and is the basis for evaluating students but also the wider learning and teaching efficiency of an academic institution.…
Descriptors: Undergraduate Students, Evaluation Methods, Evaluation Criteria, Academic Standards
Peer reviewed Peer reviewed
Direct linkDirect link
Thorne, Casey Lee – Journal of Dance Education, 2022
The research outlined in this article offers a systematic training methodology for students and licensed Traditional Chinese Medicine (TCM) practitioners to learn the clinical art and science of pulsology through dance. One of the greatest hurdles in learning pulse palpation is a TCM practitioner's inability to feel the pulse with a degree of…
Descriptors: Dance Education, Metabolism, Medicine, Asian Culture
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Doewes, Afrizal; Kurdhi, Nughthoh Arfawi; Saxena, Akrati – International Educational Data Mining Society, 2023
Automated Essay Scoring (AES) tools aim to improve the efficiency and consistency of essay scoring by using machine learning algorithms. In the existing research work on this topic, most researchers agree that human-automated score agreement remains the benchmark for assessing the accuracy of machine-generated scores. To measure the performance of…
Descriptors: Essays, Writing Evaluation, Evaluators, Accuracy
Peer reviewed Peer reviewed
PDF on ERIC Download full text
McCarthy, Kathryn S.; Magliano, Joseph P.; Snyder, Jacob O.; Kenney, Elizabeth A.; Newton, Natalie N.; Perret, Cecile A.; Knezevic, Melanie; Allen, Laura K.; McNamara, Danielle S. – Grantee Submission, 2021
The objective in the current paper is to examine the processes of how our research team negotiated meaning using an iterative design approach as we established, developed, and refined a rubric to capture comprehension processes and strategies evident in students' verbal protocols. The overarching project comprises multiple data sets, multiple…
Descriptors: Scoring Rubrics, Interrater Reliability, Design, Learning Processes
Peer reviewed Peer reviewed
Direct linkDirect link
Bimpeh, Yaw; Pointer, William; Smith, Ben Alexander; Harrison, Liz – Applied Measurement in Education, 2020
Many high-stakes examinations in the United Kingdom (UK) use both constructed-response items and selected-response items. We need to evaluate the inter-rater reliability for constructed-response items that are scored by humans. While there are a variety of methods for evaluating rater consistency across ratings in the psychometric literature, we…
Descriptors: Scoring, Generalizability Theory, Interrater Reliability, Foreign Countries
Peer reviewed Peer reviewed
Direct linkDirect link
Marion Heron; Helen Donaghue; Kieran Balloo – Teaching in Higher Education, 2024
The aim of teaching observations and post observation feedback in higher education is to support teachers to reflect on and improve their teaching. Yet, our understanding of tutors' (observers') and teachers' (observees') capacities for capitalising on these feedback opportunities is limited and there is little empirically derived advice for…
Descriptors: Feedback (Response), Classroom Observation Techniques, Teacher Evaluation, Multiple Literacies
Peer reviewed Peer reviewed
Direct linkDirect link
Jin, Kuan-Yu; Wang, Wen-Chung – Journal of Educational Measurement, 2018
The Rasch facets model was developed to account for facet data, such as student essays graded by raters, but it accounts for only one kind of rater effect (severity). In practice, raters may exhibit various tendencies such as using middle or extreme scores in their ratings, which is referred to as the rater centrality/extremity response style. To…
Descriptors: Scoring, Models, Interrater Reliability, Computation
Peer reviewed Peer reviewed
Direct linkDirect link
Gitomer, Drew H.; Martínez, José Felipe; Battey, Dan; Hyland, Nora E. – American Educational Research Journal, 2021
The Educative Teacher Performance Assessment (edTPA) is a system of standardized portfolio assessments of teaching performance mandated for use by educator preparation programs in 18 states, and approved in 21 others, as part of initial certification for preservice teachers. Because of the high stakes involved for examinees, it is critical that…
Descriptors: Evaluation, Performance Based Assessment, Test Reliability, Test Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Knoch, Ute; Chapelle, Carol A. – Language Testing, 2018
Argument-based validation requires test developers and researchers to specify what is entailed in test interpretation and use. Doing so has been shown to yield advantages (Chapelle, Enright, & Jamieson, 2010), but it also requires an analysis of how the concerns of language testers can be conceptualized in the terms used to construct a…
Descriptors: Test Validity, Language Tests, Evaluation Research, Rating Scales
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  35