NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 2,537 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Hui Jin; Cynthia Lima; Limin Wang – Educational Measurement: Issues and Practice, 2025
Although AI transformer models have demonstrated notable capability in automated scoring, it is difficult to examine how and why these models fall short in scoring some responses. This study investigated how transformer models' language processing and quantification processes can be leveraged to enhance the accuracy of automated scoring. Automated…
Descriptors: Automation, Scoring, Artificial Intelligence, Accuracy
Peer reviewed Peer reviewed
Direct linkDirect link
Pearson, Terry – FORUM: for promoting 3-19 comprehensive education, 2023
Ofsted has frequently defended the judgements made during inspections by claiming that inspection ratings are reliable, as shown by the results from the collection of studies the inspectorate has conducted. I outline the inspectorate's view of reliability and problematise the studies that it has carried out, noting that these provide insufficient…
Descriptors: Inspection, Interrater Reliability, Decision Making, Value Judgment
Peer reviewed Peer reviewed
Direct linkDirect link
Kristen Bottema-Beutel; Shannon Crowley LaPoint; So Yoon Kim; Sarah Mohiuddin; Qun Yu; Rachael McKinnon – Exceptional Children, 2024
In this secondary analysis of a previously conducted systematic review, we analyze social validity assessments in intervention research for transition-age autistic youth. Social validity is concerned with the acceptability of the intervention goals, the acceptability and feasibility of the intervention procedures, and the perceived importance of…
Descriptors: Autism Spectrum Disorders, Intervention, Validity, Psychometrics
Peer reviewed Peer reviewed
Direct linkDirect link
Susan K. Johnsen – Gifted Child Today, 2025
The author provides information about reliability and areas that educators should examine in determining if an assessment is consistent and trustworthy for use, and how it should be interpreted in making decisions about students. Reliability areas that are discussed in the column include internal consistency, test-retest or stability, inter-scorer…
Descriptors: Test Reliability, Academically Gifted, Student Evaluation, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Heena Suthar; Krisha Thiagarajah; Ibraheem Karaye; Zayra Teresa Lopez-Ixta; Trishnee Bhurosy – Journal of American College Health, 2025
Objective: To measure the interrater reliability of assessing the frequency of vegetable intake using mobile photos and descriptions. Design: Repeated measures design. Setting: A Midwestern university. Participants: Undergraduate students (N = 165). Measurable Outcome/Analysis: Number of times each of these vegetable subgroups were consumed daily:…
Descriptors: Interrater Reliability, Incidence, Food, Eating Habits
Peer reviewed Peer reviewed
Direct linkDirect link
Lottridge, Susan; Woolf, Sherri; Young, Mackenzie; Jafari, Amir; Ormerod, Chris – Journal of Computer Assisted Learning, 2023
Background: Deep learning methods, where models do not use explicit features and instead rely on implicit features estimated during model training, suffer from an explainability problem. In text classification, saliency maps that reflect the importance of words in prediction are one approach toward explainability. However, little is known about…
Descriptors: Documentation, Learning Strategies, Models, Prediction
Peer reviewed Peer reviewed
Direct linkDirect link
Stefan K. Schauber; Anne O. Olsen; Erik L. Werner; Morten Magelssen – Advances in Health Sciences Education, 2024
Introduction: Research in various areas indicates that expert judgment can be highly inconsistent. However, expert judgment is indispensable in many contexts. In medical education, experts often function as examiners in rater-based assessments. Here, disagreement between examiners can have far-reaching consequences. The literature suggests that…
Descriptors: Medical Students, Performance Based Assessment, Expertise, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Yangmeng Xu; Stefanie A. Wind – Educational Measurement: Issues and Practice, 2025
Double-scoring constructed-response items is a common but costly practice in mixed-format assessments. This study explored the impacts of Targeted Double-Scoring (TDS) and random double-scoring procedures on the quality of psychometric outcomes, including student achievement estimates, person fit, and student classifications under various…
Descriptors: Academic Achievement, Psychometrics, Scoring, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
Samuel D'Emanuele; Francesca Nardello; Fabrizio Garau; Diego Campaci; Federico Schena; Cantor Tarperi – Measurement in Physical Education and Exercise Science, 2025
The agreement between a wearable inertial sensor (GYKO, G) and the force platform (P) was assessed by evaluating "test-retest" and "inter-rater reliability." Thirty-eight subjects were enrolled; the selected indices of balance were investigated over foot positions and (un)stable conditions. Intraclass correlation coefficient…
Descriptors: Human Posture, Measurement Equipment, Interrater Reliability, Measurement Techniques
Peer reviewed Peer reviewed
Direct linkDirect link
Alexandra Jackson; Cheryl Bodnar; Elise Barrella; Juan Cruz; Krista Kecskemety – Journal of STEM Education: Innovations and Research, 2025
Recent curricular interventions in engineering education have focused on encouraging students to develop an entrepreneurial mindset (EM) to equip them with the skills needed to generate innovative ideas and address complex global problems upon entering the workforce. Methods to evaluate these interventions have been inconsistent due to the lack of…
Descriptors: Engineering Education, Entrepreneurship, Concept Mapping, Student Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Danwei Cai; Ben Naismith; Maria Kostromitina; Zhongwei Teng; Kevin P. Yancey; Geoffrey T. LaFlair – Language Learning, 2025
Globalization and increases in the numbers of English language learners have led to a growing demand for English proficiency assessments of spoken language. In this paper, we describe the development of an automatic pronunciation scorer built on state-of-the-art deep neural network models. The model is trained on a bespoke human-rated dataset that…
Descriptors: Automation, Scoring, Pronunciation, Speech Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Jennifer Manning; Jeffrey Baldwin; Natasha Powell – Innovations in Education and Teaching International, 2025
As ChatGPT continues to reshape student engagement and instructional design, it is crucial to examine its practical implications. This study aims to evaluate the effectiveness of ChatGPT3.5 and ChatGPT4 as potential automated essay scoring (AES) systems. Fifty authentic, student-written annotated bibliographies were evaluated by three human raters…
Descriptors: Foreign Countries, Essays, Writing Evaluation, Artificial Intelligence
Peer reviewed Peer reviewed
Direct linkDirect link
Tavares, Walter; Kinnear, Benjamin; Schumacher, Daniel J.; Forte, Milena – Advances in Health Sciences Education, 2023
In this perspective, the authors critically examine "rater training" as it has been conceptualized and used in medical education. By "rater training," they mean the educational events intended to "improve" rater performance and contributions during assessment events. Historically, rater training programs have focused…
Descriptors: Medical Education, Interrater Reliability, Evaluation Methods, Training
Peer reviewed Peer reviewed
Direct linkDirect link
Victoria Reyes; Elizabeth Bogumil; Levin Elias Welch – Sociological Methods & Research, 2024
Transparency is once again a central issue of debate across types of qualitative research. Work on how to conduct qualitative data analysis, on the other hand, walks us through the step-by-step process on how to code and understand the data we've collected. Although there are a few exceptions, less focus is on transparency regarding…
Descriptors: Qualitative Research, Data Analysis, Guides, Databases
Peer reviewed Peer reviewed
Direct linkDirect link
Stefanie A. Wind; Yuan Ge – Measurement: Interdisciplinary Research and Perspectives, 2024
Mixed-format assessments made up of multiple-choice (MC) items and constructed response (CR) items that are scored using rater judgments include unique psychometric considerations. When these item types are combined to estimate examinee achievement, information about the psychometric quality of each component can depend on that of the other. For…
Descriptors: Interrater Reliability, Test Bias, Multiple Choice Tests, Responses
Previous Page | Next Page ยป
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  170