NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Researchers2
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 1 to 15 of 56 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Jiangang Hao; Alina A. von Davier; Victoria Yaneva; Susan Lottridge; Matthias von Davier; Deborah J. Harris – Educational Measurement: Issues and Practice, 2024
The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely,…
Descriptors: Evaluation Methods, Artificial Intelligence, Educational Change, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024
RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…
Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics
Peer reviewed Peer reviewed
Direct linkDirect link
Mike Perkins; Jasper Roe; Binh H. Vu; Darius Postma; Don Hickerson; James McGaughran; Huy Q. Khuat – International Journal of Educational Technology in Higher Education, 2024
This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content modified to evade detection (n = 805). We compare these detectors to assess their reliability in identifying AI-generated text in educational settings, where they are increasingly used to address academic integrity…
Descriptors: Artificial Intelligence, Inclusion, Computer Software, Word Processing
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Kimbell, Richard – International Journal of Technology and Design Education, 2022
Conventional approaches to assessment involve teachers and examiners judging the quality of learners work by reference to lists of criteria or other 'outcome' statements. This paper explores a quite different method of assessment using 'Adaptive Comparative Judgement' (ACJ) that was developed within a research project at Goldsmiths University of…
Descriptors: Student Evaluation, Evaluation Methods, Alternative Assessment, Value Judgment
Sturgis, Paul W.; Marchand, Leslie; Miller, M. David; Xu, Wei; Castiglioni, Analia – Association for Institutional Research, 2022
This article introduces generalizability theory (G-theory) to institutional research and assessment practitioners, and explains how it can be utilized to evaluate the reliability of assessment procedures in order to improve student learning outcomes. The fundamental concepts associated with G-theory are briefly discussed, followed by a discussion…
Descriptors: Generalizability Theory, Institutional Research, Reliability, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024
This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…
Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction
Leech, Tony; Chambers, Lucy – Research Matters, 2022
Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…
Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Katia Ciampa; Zora Wolfe; Meagan Hensley – Technology, Pedagogy and Education, 2025
This study explores the role of artificial intelligence (AI) in K-12 student assessment practices, focusing on educators' use of AI tools. Through content analysis of active Facebook groups dedicated to AI in education, the authors examined how educators integrate AI into assessment across various grade levels and subjects. Using the Technology…
Descriptors: Artificial Intelligence, Computer Software, Technology Integration, Kindergarten
Peer reviewed Peer reviewed
Direct linkDirect link
Beasley, Zachariah J.; Piegl, Les A.; Rosen, Paul – IEEE Transactions on Learning Technologies, 2021
Accurately grading open-ended assignments in large or massive open online courses is nontrivial. Peer review is a promising solution but can be unreliable due to few reviewers and an unevaluated review form. To date, no work has leveraged sentiment analysis in the peer-review process to inform or validate grades or utilized aspect extraction to…
Descriptors: Case Studies, Online Courses, Assignments, Peer Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
Peer reviewed Peer reviewed
Direct linkDirect link
Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019
The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…
Descriptors: Test Items, Translation, Computer Software, Evaluators
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sumner, Josh – Research-publishing.net, 2021
Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…
Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making
Peer reviewed Peer reviewed
Direct linkDirect link
Raj, Gaurav; Mahajan, Manish; Singh, Dheerendra – International Journal of Web-Based Learning and Teaching Technologies, 2020
In secure web application development, the role of web services will not continue if it is not trustworthy. Retaining customers with applications is one of the major challenges if the services are not reliable and trustworthy. This article proposes a trust evaluation and decision model where the authors have defined indirect attribute, trust,…
Descriptors: Trust (Psychology), Models, Decision Making, Computer Software
Previous Page | Next Page »
Pages: 1  |  2  |  3  |  4