NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Does not meet standards1
Showing 16 to 30 of 327 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Peer reviewed Peer reviewed
Direct linkDirect link
Kinnear, George; Bennett, Max; Binnie, Rachel; Bolt, Róisín; Zheng, Yinglan – Teaching Mathematics and Its Applications, 2020
The MATH taxonomy classifies questions according to the mathematical skills required to answer them. It was created to aid the development of more balanced assessments in undergraduate mathematics and has since been used to compare different assessment regimes across school and university. To date, there has been no systematic investigation of the…
Descriptors: Taxonomy, Mathematics Instruction, Teaching Methods, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024
To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…
Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)
Neitzel, Jennifer; Early, Diane; Sideris, John; LaForrett, Doré; Abel, Michael B.; Soli, Margaret; Davidson, Dawn L.; Haboush-Deloye, Amanda; Hestenes, Linda L.; Jenson, Denise; Johnson, Cindy; Kalas, Jennifer; Mamrak, Angela; Masterson, Marie L.; Mims, Sharon U.; Oya, Patti; Philson, Bobbi; Showalter, Megan; Warner-Richter, Mallory; Kortright Wood, Jill – Journal of Early Childhood Research, 2019
The Early Childhood Environment Rating Scales, including the "Early Childhood Environment Rating Scale--Revised" (Harms et al., 2005) and the "Early Childhood Environment Rating Scale, Third Edition" (Harms et al., 2015) are the most widely used observational assessments in early childhood learning environments. The most recent…
Descriptors: Rating Scales, Early Childhood Education, Educational Quality, Scoring
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Jönsson, Anders; Balan, Andreia – Practical Assessment, Research & Evaluation, 2018
Research on teachers' grading has shown that there is great variability among teachers regarding both the process and product of grading, resulting in low comparability and issues of inequality when using grades for selection purposes. Despite this situation, not much is known about the merits or disadvantages of different models for grading. In…
Descriptors: Grading, Models, Reliability, Validity
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Polat, Murat; Turhan, Nihan Sölpük – International Journal of Curriculum and Instruction, 2021
Scoring language learners' speaking skills is open to a number of measurement errors since raters' personal judgements could involve in the process. Different grading designs in which raters score a student's whole speaking skills or a specific dimension of the speaking performance could be settled to control and minimize the amount of the error…
Descriptors: Language Tests, Scoring, Speech Communication, State Universities
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Manzano, Dexter L. – International Journal of Language Testing, 2022
The increasing popularity of self-assessment prompted several scholars to investigate its effectiveness and accuracy in relation to teacher assessment. However, most of these studies focused only on the consistency estimate perspective. Thus, the current study investigated the interrater reliability between self- and teacher assessment of…
Descriptors: Oral Language, Self Evaluation (Individuals), College Students, Interrater Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Peer reviewed Peer reviewed
Direct linkDirect link
Al-Salmani, Fatema; Thacker, Beth – Physical Review Physics Education Research, 2021
We designed a rubric to assess free-response exam problems in order to compare thinking skills evidenced in exams in classes taught by different pedagogies. The rubric was designed based on Bloom's taxonomy and then used to code exam problems. We have analyzed historical and recent exam problems in both algebra-based and calculus-based exams. In…
Descriptors: Inquiry, Thinking Skills, Scoring Rubrics, Algebra
Peer reviewed Peer reviewed
Direct linkDirect link
Wang, Zhen; Zechner, Klaus; Sun, Yu – Language Testing, 2018
As automated scoring systems for spoken responses are increasingly used in language assessments, testing organizations need to analyze their performance, as compared to human raters, across several dimensions, for example, on individual items or based on subgroups of test takers. In addition, there is a need in testing organizations to establish…
Descriptors: Automation, Scoring, Speech Tests, Language Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bosch, Nigel; Paquette, Luc – Journal of Learning Analytics, 2018
Metrics including Cohen's kappa, precision, recall, and F[subscript 1] are common measures of performance for models of discrete student states, such as a student's affect or behaviour. This study examined discrete model metrics for previously published student model examples to identify situations where metrics provided differing perspectives on…
Descriptors: Models, Comparative Analysis, Prediction, Probability
Peer reviewed Peer reviewed
Direct linkDirect link
Bronkhorst, Hugo; Roorda, Gerrit; Suhre, Cor; Goedhart, Martin – Research in Mathematics Education, 2022
Logical reasoning as part of critical thinking is becoming more and more important to prepare students for their future life in society, work, and study. This article presents the results of a quasi-experimental study with a pre-test-post-test control group design focusing on the effective use of formalisations to support logical reasoning. The…
Descriptors: Mathematics Instruction, Teaching Methods, Logical Thinking, Critical Thinking
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Moeller, Julia; Viljaranta, Jaana; Kracke, Bärbel; Dietrich, Julia – Frontline Learning Research, 2020
This article proposes a study design developed to disentangle the objective characteristics of a learning situation from individuals' subjective perceptions of that situation. The term objective characteristics refers to the agreement across students, whereas subjective perceptions refers to inter-individual heterogeneity. We describe a novel…
Descriptors: Student Attitudes, College Students, Lecture Method, Student Interests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sumner, Josh – Research-publishing.net, 2021
Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…
Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making
Pages: 1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  ...  |  22