Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 1 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 15 |
Descriptor
| Computer Software | 21 |
| Evaluation Methods | 21 |
| Interrater Reliability | 21 |
| Computer Assisted Testing | 9 |
| Comparative Analysis | 8 |
| Educational Technology | 6 |
| Foreign Countries | 6 |
| Scores | 6 |
| Scoring | 6 |
| Second Language Learning | 6 |
| Correlation | 5 |
| More ▼ | |
Source
Author
| Abedi, Jamal | 1 |
| Armijo-Olivo, Susan | 1 |
| Bahreini, Kiavash | 1 |
| Bejar, Isaac I. | 1 |
| Berry, Kenneth J. | 1 |
| Buelin-Biesecker, Jennifer… | 1 |
| Burk, John | 1 |
| Campbell, Sandy | 1 |
| Carlson, Sybil B. | 1 |
| Clariana, Roy B. | 1 |
| Coniam, David | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 5 |
| Postsecondary Education | 5 |
| Secondary Education | 4 |
| Elementary Secondary Education | 3 |
| Elementary Education | 1 |
| Grade 6 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| High Schools | 1 |
| Intermediate Grades | 1 |
| Junior High Schools | 1 |
| More ▼ | |
Audience
| Researchers | 1 |
Location
| Israel | 2 |
| Netherlands | 2 |
| Asia | 1 |
| Australia | 1 |
| Brazil | 1 |
| China | 1 |
| Connecticut | 1 |
| Cuba | 1 |
| Denmark | 1 |
| Egypt | 1 |
| Estonia | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Test of English as a Foreign… | 2 |
| Graduate Record Examinations | 1 |
| National Assessment of… | 1 |
| Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Armijo-Olivo, Susan; Craig, Rodger; Campbell, Sandy – Research Synthesis Methods, 2020
Background: Evidence from new health technologies is growing, along with demands for evidence to inform policy decisions, creating challenges in completing health technology assessments (HTAs)/systematic reviews (SRs) in a timely manner. Software can decrease the time and burden by automating the process, but evidence validating such software is…
Descriptors: Comparative Analysis, Computer Software, Decision Making, Randomized Controlled Trials
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Sumner, Josh – Research-publishing.net, 2021
Comparative Judgement (CJ) has emerged as a technique that typically makes use of holistic judgement to assess difficult-to-specify constructs such as production (speaking and writing) in Modern Foreign Languages (MFL). In traditional approaches, markers assess candidates' work one-by-one in an absolute manner, assigning scores to different…
Descriptors: Holistic Approach, Student Evaluation, Comparative Analysis, Decision Making
Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…
Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning
Bahreini, Kiavash; Nadolski, Rob; Westera, Wim – Education and Information Technologies, 2016
This paper presents the voice emotion recognition part of the FILTWAM framework for real-time emotion recognition in affective e-learning settings. FILTWAM (Framework for Improving Learning Through Webcams And Microphones) intends to offer timely and appropriate online feedback based upon learner's vocal intonations and facial expressions in order…
Descriptors: Affective Behavior, Emotional Response, Electronic Learning, Recognition (Psychology)
Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010
We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…
Descriptors: Grading, Assignments, Correlation, Interrater Reliability
Mudford, Oliver C.; Taylor, Sarah Ann; Martin, Neil T. – Journal of Applied Behavior Analysis, 2009
We reviewed all research articles in 10 recent volumes of the "Journal of Applied Behavior Analysis (JABA)": Vol. 28(3), 1995, through Vol. 38(2), 2005. Continuous recording was used in the majority (55%) of the 168 articles reporting data on free-operant human behaviors. Three methods for reporting interobserver agreement (exact agreement,…
Descriptors: Interrater Reliability, Behavioral Science Research, Literature Reviews, Observation
Buelin-Biesecker, Jennifer Katherine – ProQuest LLC, 2012
This study compared the creative outcomes in student work resulting from two pedagogical approaches to creative problem solving activities. A secondary goal was to validate the Consensual Assessment Technique (CAT) as a means of assessing creativity. Linear models for problem solving and design processes serve as the current paradigm in classroom…
Descriptors: Technology Education, Creativity, Problem Solving, Teaching Methods
Mogey, Nora; Paterson, Jessie; Burk, John; Purcell, Michael – ALT-J: Research in Learning Technology, 2010
Students at the University of Edinburgh do almost all their work on computers, but at the end of the semester they are examined by handwritten essays. Intuitively it would be appealing to allow students the choice of handwriting or typing, but this raises a concern that perhaps this might not be "fair"--that the choice a student makes,…
Descriptors: Handwriting, Essay Tests, Interrater Reliability, Grading
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
Pare, D. E.; Joordens, S. – Journal of Computer Assisted Learning, 2008
As class sizes increase, methods of assessments shift from costly traditional approaches (e.g. expert-graded writing assignments) to more economic and logistically feasible methods (e.g. multiple-choice testing, computer-automated scoring, or peer assessment). While each method of assessment has its merits, it is peer assessment in particular,…
Descriptors: Writing Assignments, Undergraduate Students, Teaching Assistants, Peer Evaluation
Peer reviewedBerry, Kenneth J.; Mielke, Paul W., Jr. – Educational and Psychological Measurement, 1997
Describes a FORTRAN software program that calculates the probability of an observed difference between agreement measures obtained from two independent sets of raters. An example illustrates the use of the DIFFER program in evaluating undergraduate essays. (Author/SLD)
Descriptors: Comparative Analysis, Computer Software, Evaluation Methods, Higher Education
Peer reviewedAbedi, Jamal – Multivariate Behavioral Research, 1996
The Interrater/Test Reliability System (ITRS) is described. The ITRS is a comprehensive computer tool used to address questions of interrater reliability that computes several different indices of interrater reliability and the generalizability coefficient over raters and topics. The system is available in IBM compatible or Macintosh format. (SLD)
Descriptors: Computer Software, Computer Software Evaluation, Evaluation Methods, Evaluators
Peer reviewedPage, Ellis Batten – Journal of Experimental Education, 1994
National Assessment of Educational Progress writing sample essays from 1988 and 1990 (495 and 599 essays) were subjected to computerized grading and human ratings. Cross-validation suggests that computer scoring is superior to a two-judge panel, a finding encouraging for large programs of essay evaluation. (SLD)
Descriptors: Computer Assisted Testing, Computer Software, Essays, Evaluation Methods
Previous Page | Next Page »
Pages: 1 | 2
Direct link
