Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 9 |
| Since 2017 (last 10 years) | 17 |
| Since 2007 (last 20 years) | 31 |
Descriptor
| Computer Software | 34 |
| Scores | 34 |
| Reliability | 16 |
| Foreign Countries | 14 |
| Comparative Analysis | 13 |
| English (Second Language) | 11 |
| Evaluation Methods | 11 |
| Interrater Reliability | 11 |
| Second Language Learning | 11 |
| Test Reliability | 10 |
| Correlation | 9 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 10 |
| Postsecondary Education | 8 |
| Elementary Education | 4 |
| Secondary Education | 4 |
| Elementary Secondary Education | 2 |
| High Schools | 2 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Intermediate Grades | 1 |
Audience
Location
| Australia | 2 |
| China | 2 |
| Egypt | 2 |
| Florida | 2 |
| Israel | 2 |
| Pakistan | 2 |
| Philippines | 2 |
| Turkey | 2 |
| Asia | 1 |
| Brazil | 1 |
| Connecticut | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Graduate Record Examinations | 1 |
| International English… | 1 |
| Test of English as a Foreign… | 1 |
| Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Kuan-Yu Jin; Wai-Lok Siu – Journal of Educational Measurement, 2025
Educational tests often have a cluster of items linked by a common stimulus ("testlet"). In such a design, the dependencies caused between items are called "testlet effects." In particular, the directional testlet effect (DTE) refers to a recursive influence whereby responses to earlier items can positively or negatively affect…
Descriptors: Models, Test Items, Educational Assessment, Scores
The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues
Tack, Anaïs; Piech, Chris – International Educational Data Mining Society, 2022
How can we test whether state-of-the-art generative models, such as Blender and GPT-3, are good AI teachers, capable of replying to a student in an educational dialogue? Designing an AI teacher test is challenging: although evaluation methods are much-needed, there is no off-the-shelf solution to measuring pedagogical ability. This paper reports…
Descriptors: Artificial Intelligence, Dialogs (Language), Bayesian Statistics, Decision Making
Yubin Xu; Lin Liu; Jianwen Xiong; Guangtian Zhu – Journal of Baltic Science Education, 2025
As the development and application of large language models (LLMs) in physics education progress, the well-known AI-based chatbot ChatGPT4 has presented numerous opportunities for educational assessment. Investigating the potential of AI tools in practical educational assessment carries profound significance. This study explored the comparative…
Descriptors: Physics, Artificial Intelligence, Computer Software, Accuracy
Shin, Jinnie; Gierl, Mark J. – Language Testing, 2021
Automated essay scoring (AES) has emerged as a secondary or as a sole marker for many high-stakes educational assessments, in native and non-native testing, owing to remarkable advances in feature engineering using natural language processing, machine learning, and deep-neural algorithms. The purpose of this study is to compare the effectiveness…
Descriptors: Scoring, Essays, Writing Evaluation, Computer Software
Paquot, Magali; Rubin, Rachel; Vandeweerd, Nathan – Language Learning, 2022
The main objective of this Methods Showcase Article is to show how the technique of adaptive comparative judgment, coupled with a crowdsourcing approach, can offer practical solutions to reliability issues as well as to address the time and cost difficulties associated with a text-based approach to proficiency assessment in L2 research. We…
Descriptors: Comparative Analysis, Decision Making, Language Proficiency, Reliability
Hassan Saleh Mahdi; Ahmed Alkhateeb – International Journal of Computer-Assisted Language Learning and Teaching, 2025
This study aims to develop a robust rubric for evaluating artificial intelligence (AI)--assisted essay writing in English as a Foreign Language (EFL) contexts. Employing a modified Delphi technique, we conducted a comprehensive literature review and administered Likert scale questionnaires. This process yielded nine key evaluation criteria,…
Descriptors: Scoring Rubrics, Essays, Writing Evaluation, Artificial Intelligence
Teker, Gülsen Tasdelen; Güler, Nese – International Journal of Assessment Tools in Education, 2019
One of the important theories in education and psychology is Generalizability (G) Theory and various properties distinguish it from the other measurement theories. To better understand methodological trends of G theory, a thematic content analysis was conducted. This study analyzes the studies using generalizability theory in the field of…
Descriptors: Generalizability Theory, Content Analysis, Foreign Countries, Education
Chung-You Tsai; Yi-Ti Lin; Iain Kelsall Brown – Education and Information Technologies, 2024
To determine the impacts of using ChatGPT to assist English as a foreign language (EFL) English college majors in revising essays and the possibility of leading to higher scores and potentially causing unfairness. A prospective, double-blinded, paired-comparison study was conducted in Feb. 2023. A total of 44 students provided 44 original essays…
Descriptors: Artificial Intelligence, Computer Software, Technology Uses in Education, English (Second Language)
Beasley, Zachariah J.; Piegl, Les A.; Rosen, Paul – IEEE Transactions on Learning Technologies, 2021
Accurately grading open-ended assignments in large or massive open online courses is nontrivial. Peer review is a promising solution but can be unreliable due to few reviewers and an unevaluated review form. To date, no work has leveraged sentiment analysis in the peer-review process to inform or validate grades or utilized aspect extraction to…
Descriptors: Case Studies, Online Courses, Assignments, Peer Evaluation
Maghfiroh, Anissa; Kuswanto, Heru – International Journal of Instruction, 2022
This research aims to reveal the effectiveness of the use of Kofie GeBoL media in improving (1) vector representation ability and (2) critical thinking ability in physics instruction. It is a descriptive quantitative study with the quasi-experiment design. It was conducted in two stages: empirical try out and implementation of Kofie GeboL to see…
Descriptors: Physics, Instructional Effectiveness, Critical Thinking, Thinking Skills
Samosa, Resty C.; Barribal, Jemie; Cupan, Roschelle S.; Pagulayan, Jane R.; Tampipi, Jessalyn N. – Online Submission, 2021
This study assessed the effectiveness of an online-merge-offline Jamboard application as an innovation in teaching word problems to Grade 4 learners was evaluated in this study. The research evaluates a learner's level of mathematical word problem skills in terms of understanding, devising a plan, solving a problem, and interpreting, as well as…
Descriptors: Teaching Methods, Mathematics Instruction, Grade 4, Elementary School Students
de Ruiter, Laura E.; Bers, Marina U. – Computer Science Education, 2022
Background and Context: Despite the increasing implementation of coding in early curricula, there are few valid and reliable assessments of coding abilities for young children. This impedes studying learning outcomes and the development and evaluation of curricula. Objective: Developing and validating a new instrument for assessing young…
Descriptors: Programming Languages, Computer Software, Coding, Computer Science Education
Botarleanu, Robert-Mihai; Dascalu, Mihai; Watanabe, Micah; Crossley, Scott Andrew; McNamara, Danielle S. – Grantee Submission, 2022
Age of acquisition (AoA) is a measure of word complexity which refers to the age at which a word is typically learned. AoA measures have shown strong correlations with reading comprehension, lexical decision times, and writing quality. AoA scores based on both adult and child data have limitations that allow for error in measurement, and increase…
Descriptors: Age Differences, Vocabulary Development, Correlation, Reading Comprehension
Kovalkov, Anastasia; Paassen, Benjamin; Segal, Avi; Gal, Kobi; Pinkwart, Niels – International Educational Data Mining Society, 2021
Promoting creativity is considered an important goal of education, but creativity is notoriously hard to define and measure. In this paper, we make the journey from defining a formal creativity and applying the measure in a practical domain. The measure relies on core theoretical concepts in creativity theory, namely fluency, flexibility, and…
Descriptors: Creativity, Theory Practice Relationship, Evaluators, Specialists
Linlin, Cao – English Language Teaching, 2020
Through Many-Facet Rasch analysis, this study explores the rating differences between 1 computer automatic rater and 5 expert teacher raters on scoring 119 students in a computerized English listening-speaking test. Results indicate that both automatic and the teacher raters demonstrate good inter-rater reliability, though the automatic rater…
Descriptors: Language Tests, Computer Assisted Testing, English (Second Language), Second Language Learning

Peer reviewed
Direct link
