Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 3 |
| Since 2017 (last 10 years) | 5 |
| Since 2007 (last 20 years) | 14 |
Descriptor
Source
Author
Publication Type
| Journal Articles | 13 |
| Reports - Research | 11 |
| Tests/Questionnaires | 3 |
| Reports - Evaluative | 2 |
| Collected Works - Proceedings | 1 |
| Dissertations/Theses -… | 1 |
| Reports - Descriptive | 1 |
Education Level
| Higher Education | 5 |
| Postsecondary Education | 5 |
| Secondary Education | 4 |
| Elementary Secondary Education | 3 |
| Elementary Education | 1 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Middle Schools | 1 |
Audience
| Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| Graduate Record Examinations | 1 |
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Swapna Haresh Teckwani; Amanda Huee-Ping Wong; Nathasha Vihangi Luke; Ivan Cherh Chiet Low – Advances in Physiology Education, 2024
The advent of artificial intelligence (AI), particularly large language models (LLMs) like ChatGPT and Gemini, has significantly impacted the educational landscape, offering unique opportunities for learning and assessment. In the realm of written assessment grading, traditionally viewed as a laborious and subjective process, this study sought to…
Descriptors: Accuracy, Reliability, Computational Linguistics, Standards
Park, Yeonggwang; Cádiz, Manuel Díaz; Nagle, Kathleen F.; Stepp, Cara E. – Journal of Speech, Language, and Hearing Research, 2020
Purpose: Assessment of strained voice quality is difficult due to the weak reliability of auditory-perceptual evaluation and lack of strong acoustic correlates. This study evaluated the contributions of relative fundamental frequency (RFF) and mid-to-high frequency noise to the perception of strain. Method: Stimuli were created using recordings of…
Descriptors: Acoustics, Audio Equipment, Auditory Perception, Correlation
Wang, Qiao – Education and Information Technologies, 2022
This study searched for open-source semantic similarity tools and evaluated their effectiveness in automated content scoring of fact-based essays written by English-as-a-Foreign-Language (EFL) learners. Fifty writing samples under a fact-based writing task from an academic English course in a Japanese university were collected and a gold standard…
Descriptors: English (Second Language), Second Language Learning, Second Language Instruction, Scoring
Yun, Jiyeo – ProQuest LLC, 2017
Since researchers investigated automatic scoring systems in writing assessments, they have dealt with relationships between human and machine scoring, and then have suggested evaluation criteria for inter-rater agreement. The main purpose of my study is to investigate the magnitudes of and relationships among indices for inter-rater agreement used…
Descriptors: Interrater Reliability, Essays, Scoring, Evaluators
Diener, Marissa L.; Wright, Cheryl A.; Smith, Katherine N.; Wright, Scott D. – Creativity Research Journal, 2014
The goal of this study was to develop a measure of creativity that builds on the strengths of youth with autism spectrum disorders (ASD). The assessment of creativity focused on the visual-spatial abilities of these youth using 3D modeling software. One of the objectives of the research was to develop a measure of creativity in an authentic…
Descriptors: Autism, Pervasive Developmental Disorders, Creativity, Creativity Tests
Granfeldt, Jonas; Ågren, Malin – Language Testing, 2014
One core area of research in Second Language Acquisition is the identification and definition of developmental stages in different L2s. For L2 French, Bartning and Schlyter (2004) presented a model of six morphosyntactic stages of development in the shape of grammatical profiles. The model formed the basis for the computer program Direkt Profil…
Descriptors: Second Language Learning, Language Tests, French, Language Teachers
Zhang, Mo; Breyer, F. Jay; Lorenz, Florian – ETS Research Report Series, 2013
In this research, we investigated the suitability of implementing "e-rater"® automated essay scoring in a high-stakes large-scale English language testing program. We examined the effectiveness of generic scoring and 2 variants of prompt-based scoring approaches. Effectiveness was evaluated on a number of dimensions, including agreement…
Descriptors: Computer Assisted Testing, Computer Software, Scoring, Language Tests
Naude, Kevin A.; Greyling, Jean H.; Vogts, Dieter – Computers & Education, 2010
We present a novel approach to the automated marking of student programming assignments. Our technique quantifies the structural similarity between unmarked student submissions and marked solutions, and is the basis by which we assign marks. This is accomplished through an efficient novel graph similarity measure ("AssignSim"). Our experiments…
Descriptors: Grading, Assignments, Correlation, Interrater Reliability
Coniam, David – ReCALL, 2009
This paper describes a study of the computer essay-scoring program BETSY. While the use of computers in rating written scripts has been criticised in some quarters for lacking transparency or lack of fit with how human raters rate written scripts, a number of essay rating programs are available commercially, many of which claim to offer comparable…
Descriptors: Writing Tests, Scoring, Foreign Countries, Interrater Reliability
A Multi-Component Model for Assessing Learning Objects: The Learning Object Evaluation Metric (LOEM)
Kay, Robin H.; Knaack, Liesel – Australasian Journal of Educational Technology, 2008
While discussion of the criteria needed to assess learning objects has been extensive, a formal, systematic model for evaluation has yet to be thoroughly tested. The purpose of the following study was to develop and assess a multi-component model for evaluating learning objects. The Learning Object Evaluation Metric (LOEM) was developed from a…
Descriptors: Foreign Countries, Models, Measurement Techniques, Evaluation Criteria
Erkens, Gijsbert; Janssen, Jeroen – International Journal of Computer-Supported Collaborative Learning, 2008
Although protocol analysis can be an important tool for researchers to investigate the process of collaboration and communication, the use of this method of analysis can be time consuming. Hence, an automatic coding procedure for coding dialogue acts was developed. This procedure helps to determine the communicative function of messages in online…
Descriptors: Protocol Analysis, Validity, Cooperation, Coding
Heath, Edward M.; Coleman, Karen J.; Lensegrav, Tera L.; Fallon, Jennifer A. – Research Quarterly for Exercise and Sport, 2006
The System for Observing Fitness Instruction Time (SOFIT) is a direct observation system specifically developed for use during physical education (PE; McKenzie, 1991; McKenzie, Sallis, & Nader, 1991). The purpose of this study was to validate the estimates of time spent in various physical activity intensities obtained with the paper and pencil…
Descriptors: Validity, Physical Activities, Physical Education, Physical Fitness
Clariana, Roy B.; Wallace, Patricia – Journal of Educational Computing Research, 2007
This proof-of-concept investigation describes a computer-based approach for deriving the knowledge structure of individuals and of groups from their written essays, and considers the convergent criterion-related validity of the computer-based scores relative to human rater essay scores and multiple-choice test scores. After completing a…
Descriptors: Computer Assisted Testing, Multiple Choice Tests, Construct Validity, Cognitive Structures
Carlson, Sybil B.; And Others – 1985
Four writing samples were obtained from 638 foreign college applicants who represented three major foreign language groups (Arabic, Chinese, and Spanish), and from 60 native English speakers. All four were scored holistically, two were also scored for sentence-level and discourse-level skills, and some were scored by the Writer's Workbench…
Descriptors: Arabic, Chinese, College Entrance Examinations, Computer Software
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
