ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	11
Since 2017 (last 10 years)	20
Since 2007 (last 20 years)	40

Descriptor

Computer Software	56
Evaluation Methods	56
Reliability	28
Interrater Reliability	21
Comparative Analysis	19
Educational Technology	15
Computer Assisted Testing	14
Student Evaluation	13
Validity	13
Scores	11
Test Reliability	11
Evaluators	10
Foreign Countries	10
Higher Education	10
Artificial Intelligence	9
Models	9
Second Language Learning	9
Teaching Methods	9
Computer Assisted Instruction	8
Computer Software Evaluation	8
Correlation	8
Grading	8
Technology Integration	8
Test Validity	8
Decision Making	7
More ▼

Publication Type

Journal Articles	41
Reports - Research	28
Reports - Evaluative	13
Reports - Descriptive	8
Speeches/Meeting Papers	8
Tests/Questionnaires	3
Dissertations/Theses -…	2
Opinion Papers	2
Book/Product Reviews	1
Collected Works - Proceedings	1

Education Level

Higher Education	12
Postsecondary Education	12
Elementary Secondary Education	8
Secondary Education	8
Elementary Education	6
Middle Schools	3
Early Childhood Education	2
Junior High Schools	2
Primary Education	2
Grade 2	1
Grade 3	1
Grade 6	1
Grade 7	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Kindergarten	1
More ▼

Audience

Researchers

Location

Australia	2
Egypt	2
Israel	2
Netherlands	2
Turkey	2
United Kingdom	2
Asia	1
Brazil	1
China	1
Connecticut	1
Cuba	1
Denmark	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
India	1
Ireland	1
Italy	1
Japan	1
Kazakhstan	1
North Carolina	1
Norway	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Dale Chall Readability Formula	1
Flesch Kincaid Grade Level…	1
Flesch Reading Ease Formula	1
Fry Readability Formula	1
Graduate Record Examinations	1
National Assessment of…	1
Torrance Tests of Creative…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 56 results Save | Export

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI

Peer reviewed

Direct link

Jiangang Hao; Alina A. von Davier; Victoria Yaneva; Susan Lottridge; Matthias von Davier; Deborah J. Harris – Educational Measurement: Issues and Practice, 2024

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely,…

Descriptors: Evaluation Methods, Artificial Intelligence, Educational Change, Computer Software

Towards the Automatic Risk of Bias Assessment on Randomized Controlled Trials: A Comparison of RobotReviewer and Humans

Peer reviewed

Direct link

Yuan Tian; Xi Yang; Suhail A. Doi; Luis Furuya-Kanamori; Lifeng Lin; Joey S. W. Kwong; Chang Xu – Research Synthesis Methods, 2024

RobotReviewer is a tool for automatically assessing the risk of bias in randomized controlled trials, but there is limited evidence of its reliability. We evaluated the agreement between RobotReviewer and humans regarding the risk of bias assessment based on 1955 randomized controlled trials. The risk of bias in these trials was assessed via two…

Descriptors: Risk, Randomized Controlled Trials, Classification, Robotics

Simple Techniques to Bypass GenAI Text Detectors: Implications for Inclusive Education

Peer reviewed

Direct link

Mike Perkins; Jasper Roe; Binh H. Vu; Darius Postma; Don Hickerson; James McGaughran; Huy Q. Khuat – International Journal of Educational Technology in Higher Education, 2024

This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content modified to evade detection (n = 805). We compare these detectors to assess their reliability in identifying AI-generated text in educational settings, where they are increasingly used to address academic integrity…

Descriptors: Artificial Intelligence, Inclusion, Computer Software, Word Processing

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Examining the Reliability of Adaptive Comparative Judgement (ACJ) as an Assessment Tool in Educational Settings

Peer reviewed

Direct link

Kimbell, Richard – International Journal of Technology and Design Education, 2022

Conventional approaches to assessment involve teachers and examiners judging the quality of learners work by reference to lists of criteria or other 'outcome' statements. This paper explores a quite different method of assessment using 'Adaptive Comparative Judgement' (ACJ) that was developed within a research project at Goldsmiths University of…

Descriptors: Student Evaluation, Evaluation Methods, Alternative Assessment, Value Judgment

Generalizability Theory and Its Application to Institutional Research. The AIR Professional File, Spring 2022. Article 156

Download full text

Sturgis, Paul W.; Marchand, Leslie; Miller, M. David; Xu, Wei; Castiglioni, Analia – Association for Institutional Research, 2022

This article introduces generalizability theory (G-theory) to institutional research and assessment practitioners, and explains how it can be utilized to evaluate the reliability of assessment procedures in order to improve student learning outcomes. The fundamental concepts associated with G-theory are briefly discussed, followed by a discussion…

Descriptors: Generalizability Theory, Institutional Research, Reliability, Computer Software

Evaluating Large Language Models in Analysing Classroom Dialogue

Peer reviewed

Direct link

Yun Long; Haifeng Luo; Yu Zhang – npj Science of Learning, 2024

This study explores the use of Large Language Models (LLMs), specifically GPT-4, in analysing classroom dialogue--a key task for teaching diagnosis and quality improvement. Traditional qualitative methods are both knowledge- and labour-intensive. This research investigates the potential of LLMs to streamline and enhance this process. Using…

Descriptors: Classroom Communication, Computational Linguistics, Chinese, Mathematics Instruction

How Do Judges in Comparative Judgement Exercises Make Their Judgements?

Download full text

Leech, Tony; Chambers, Lucy – Research Matters, 2022

Two of the central issues in comparative judgement (CJ), which are perhaps underexplored compared to questions of the method's reliability and technical quality, are "what processes do judges use to make their decisions" and "what features do they focus on when making their decisions?" This article discusses both, in the…

Descriptors: Comparative Analysis, Decision Making, Evaluators, Reliability

Crowdsourced Adaptive Comparative Judgment: A Community-Based Solution for Proficiency Rating

Peer reviewed

Direct link

Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022

The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…

Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability

From Entry to Transformation: Exploring AI Integration in Teachers' K-12 Assessment Practices

Peer reviewed

Direct link

Katia Ciampa; Zora Wolfe; Meagan Hensley – Technology, Pedagogy and Education, 2025

This study explores the role of artificial intelligence (AI) in K-12 student assessment practices, focusing on educators' use of AI tools. Through content analysis of active Facebook groups dedicated to AI in education, the authors examined how educators integrate AI into assessment across various grade levels and subjects. Using the Technology…

Descriptors: Artificial Intelligence, Computer Software, Technology Integration, Kindergarten

Polarity in the Classroom: A Case Study Leveraging Peer Sentiment Toward Scalable Assessment

Peer reviewed

Direct link

Beasley, Zachariah J.; Piegl, Les A.; Rosen, Paul – IEEE Transactions on Learning Technologies, 2021

Accurately grading open-ended assignments in large or massive open online courses is nontrivial. Peer review is a promising solution but can be unreliable due to few reviewers and an unevaluated review form. To date, no work has leveraged sentiment analysis in the peer-review process to inform or validate grades or utilized aspect extraction to…

Descriptors: Case Studies, Online Courses, Assignments, Peer Evaluation

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

Calibrated Parsing Items Evaluation: A Step towards Objectifying the Translation Assessment

Peer reviewed

Direct link

Akbari, Alireza; Shahnazari, Mohammadtaghi – Language Testing in Asia, 2019

The present research paper introduces a translation evaluation method called Calibrated Parsing Items Evaluation (CPIE hereafter). This evaluation method maximizes translators' performance through identifying the parsing items with an optimal p-docimology and d-index (item discrimination). This method checks all the possible parses (annotations)…

Descriptors: Test Items, Translation, Computer Software, Evaluators

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Trust Decision Model and Trust Evaluation Model for Quality Web Service Identification in Web Service Lifecycle Using QSW Data Analysis

Peer reviewed

Direct link

Raj, Gaurav; Mahajan, Manish; Singh, Dheerendra – International Journal of Web-Based Learning and Teaching Technologies, 2020

In secure web application development, the role of web services will not continue if it is not trustworthy. Retaining customers with applications is one of the major challenges if the services are not reliable and trustworthy. This article proposes a trust evaluation and decision model where the authors have defined indirect attribute, trust,…

Descriptors: Trust (Psychology), Models, Decision Making, Computer Software

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4

British Journal of…	2
Computers & Education	2
Educational and Psychological…	2
International Educational…	2
International Journal of…	2
Online Submission	2
Research Synthesis Methods	2
Technology, Pedagogy and…	2
ALT-J: Research in Learning…	1
Advances in Language and…	1
Association for Institutional…	1
Behavior Modification	1
ETS Research Report Series	1
Education and Information…	1
Educational Measurement:…	1
Educational Technology…	1
English Language Teaching	1
Evaluation and the Health…	1
Grantee Submission	1
IEEE Transactions on Learning…	1
International Association for…	1
International Journal of…	1
International Journal of…	1
Journal of Applied Behavior…	1
Journal of Computer Assisted…	1
More ▼

Mott, Michael S.	2
Abedi, Jamal	1
Akbari, Alireza	1
Akihito Kamata	1
Alina A. von Davier	1
Alsree, Zubaida	1
Amrein-Beardsley, Audrey	1
Armijo-Olivo, Susan	1
Aziz, Anealka	1
Bahreini, Kiavash	1
Barnette, J. Jackson	1
Barrett, Andrew J.	1
Beasley, Zachariah J.	1
Bejar, Isaac I.	1
Berry, Kenneth J.	1
Binh H. Vu	1
Blanchard, Jay S.	1
Brossart, Daniel F.	1
Buelin-Biesecker, Jennifer…	1
Burk, John	1
Campbell, Sandy	1
Carlson, Sybil B.	1
Castiglioni, Analia	1
Chambers, Lucy	1
More ▼