ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	1
Since 2017 (last 10 years)	5
Since 2007 (last 20 years)	15

Descriptor

Computer Software	21
Evaluation Methods	21
Interrater Reliability	21
Computer Assisted Testing	9
Comparative Analysis	8
Educational Technology	6
Foreign Countries	6
Scores	6
Scoring	6
Second Language Learning	6
Correlation	5
English (Second Language)	5
Evaluators	5
Higher Education	5
Student Evaluation	5
Decision Making	4
Grading	4
Teaching Methods	4
Undergraduate Students	4
Accuracy	3
Artificial Intelligence	3
Classification	3
College Faculty	3
Creativity	3
Data Analysis	3
More ▼

Source

International Educational…	2
ALT-J: Research in Learning…	1
Computers & Education	1
ETS Research Report Series	1
Education and Information…	1
Educational and Psychological…	1
English Language Teaching	1
Evaluation and the Health…	1
International Association for…	1
Journal of Applied Behavior…	1
Journal of Computer Assisted…	1
Journal of Educational…	1
Journal of Experimental…	1
Multivariate Behavioral…	1
ProQuest LLC	1
ReCALL	1
Research Synthesis Methods	1
Research-publishing.net	1
More ▼

Publication Type

Journal Articles	14
Reports - Research	10
Reports - Evaluative	5
Reports - Descriptive	3
Speeches/Meeting Papers	2
Tests/Questionnaires	2
Book/Product Reviews	1
Collected Works - Proceedings	1
Dissertations/Theses -…	1

Education Level

Higher Education	5
Postsecondary Education	5
Secondary Education	4
Elementary Secondary Education	3
Elementary Education	1
Grade 6	1
Grade 7	1
Grade 8	1
High Schools	1
Intermediate Grades	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Researchers

Location

Israel	2
Netherlands	2
Asia	1
Australia	1
Brazil	1
China	1
Connecticut	1
Cuba	1
Denmark	1
Egypt	1
Estonia	1
Florida	1
Germany	1
Greece	1
Hawaii	1
Hong Kong	1
India	1
Ireland	1
Italy	1
Japan	1
Kazakhstan	1
North Carolina	1
Norway	1
Ohio	1
Pakistan	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Test of English as a Foreign…	2
Graduate Record Examinations	1
National Assessment of…	1
Torrance Tests of Creative…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 21 results Save | Export

Comparing Machine and Human Reviewers to Evaluate the Risk of Bias in Randomized Controlled Trials

Peer reviewed

Direct link

Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020

Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…

Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials

The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues

Peer reviewed
PDF on ERIC

Download full text

Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022

How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…

Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making

Comparative Judgement: Assess Student Production without Absolute Judgements

Peer reviewed
PDF on ERIC

Download full text

Sumner, Josh – Research-publishing.net, 2021

Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…

Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making

Modeling Creativity in Visual Programming: From Theory to Practice

Peer reviewed
PDF on ERIC

Download full text

Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021

Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…

Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists

Comparison of Automatic and Expert Teachers' Rating of Computerized English Listening-Speaking Test

Peer reviewed
PDF on ERIC

Download full text

Linlin, Cao – English Language Teaching, 2020

Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…

Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Towards Real-Time Speech Emotion Recognition for Affective E-Learning

Peer reviewed

Direct link

Bahreini, Kiavash; Nadolski, Rob; Westera, Wim – Education and Information Technologies, 2016

This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…

Descriptors: Affective Behavior, Emotional Response, Electronic Learning, Recognition (Psychology)

Marking Student Programs Using Graph Similarity

Peer reviewed

Direct link

Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010

We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…

Descriptors: Grading, Assignments, Correlation, Interrater Reliability

Continuous Recording and Interobserver Agreement Algorithms Reported in the "Journal of Applied Behavior Analysis" (1995-2005)

Peer reviewed
PDF on ERIC

Download full text

Direct link

Mudford, Oliver C.; Taylor, Sarah Ann; Martin, Neil T. – Journal of Applied Behavior Analysis, 2009

We reviewed all research articles in 10 recent volumes of the "Journal of Applied Behavior Analysis (JABA)": Vol. 28(3), 1995, through Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles reporting data on free-operant human behaviors. Three methods for reporting interobserver agreement (exact agreement,…

Descriptors: Interrater Reliability, Behavioral Science Research, Literature Reviews, Observation

Fostering and Assessing Creativity in Technology Education

Direct link

Buelin-Biesecker, Jennifer Katherine – ProQuest LLC, 2012

This study compared the creative outcomes in student work resulting from two pedagogical approaches to creative problem solving activities. A secondary goal was to validate the Consensual Assessment Technique (CAT) as a means of assessing creativity. Linear models for problem solving and design processes serve as the current paradigm in classroom…

Descriptors: Technology Education, Creativity, Problem Solving, Teaching Methods

Typing Compared with Handwriting for Essay Examinations at University: Letting the Students Choose

Peer reviewed

Direct link

Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010

Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…

Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading

Experimenting with a Computer Essay-Scoring Program Based on ESL Student Writing Scripts

Peer reviewed

Direct link

Coniam, David – ReCALL, 2009

This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…

Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability

Peering into Large Lectures: Examining Peer and Expert Mark Agreement Using peerScholar, an Online Peer Assessment Tool

Peer reviewed

Direct link

Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008

As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…

Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation

Agreement Measure Comparisons between Two Independent Sets of Raters.

Peer reviewed

Berry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997

Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)

Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education

Interrater/Test Reliability System (ITRS).

Peer reviewed

Abedi, Jamal – Multivariate Behavioral Research, 1996

The Interrater/Test Reliability System (ITRS) is described. The ITRS is a comprehensive computer tool used to address questions of interrater reliability that computes several different indices of interrater reliability and the generalizability coefficient over raters and topics. The system is available in IBM compatible or Macintosh format. (SLD)

Descriptors: Computer Software, Computer Software Evaluation, Evaluation Methods, Evaluators

Computer Grading of Student Prose, Using Modern Concepts and Software.

Peer reviewed

Page, Ellis Batten – Journal of Experimental Education, 1994

National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)

Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods

Previous Page | Next Page »

Pages: 1 | 2

Abedi, Jamal	1
Armijo-Olivo, Susan	1
Bahreini, Kiavash	1
Bejar, Isaac I.	1
Berry, Kenneth J.	1
Buelin-Biesecker, Jennifer…	1
Burk, John	1
Campbell, Sandy	1
Carlson, Sybil B.	1
Clariana, Roy B.	1
Coniam, David	1
Craig, Rodger	1
Gal, Kobi	1
Greyling, Jean H.	1
Hemat, Ramin	1
Jansen, Hans P.	1
Joordens, S.	1
Kovalkov, Anastasia	1
Krutchkoff, David J.	1
Linlin, Cao	1
Martin, Neil T.	1
Mielke, Paul W., Jr.	1
Mogey, Nora	1
Mudford, Oliver C.	1
More ▼