Publication Date
| In 2026 | 0 |
| Since 2025 | 4 |
| Since 2022 (last 5 years) | 17 |
| Since 2017 (last 10 years) | 24 |
| Since 2007 (last 20 years) | 31 |
Descriptor
| Natural Language Processing | 33 |
| Test Items | 33 |
| Computer Assisted Testing | 15 |
| Automation | 14 |
| Artificial Intelligence | 11 |
| Foreign Countries | 11 |
| Multiple Choice Tests | 11 |
| Test Construction | 11 |
| Models | 9 |
| Semantics | 8 |
| Accuracy | 6 |
| More ▼ | |
Source
Author
| Deane, Paul | 2 |
| Futagi, Yoko | 2 |
| Goldhammer, Frank | 2 |
| Olney, Andrew M. | 2 |
| Saha, Sujan Kumar | 2 |
| Sälzer, Christine | 2 |
| Zehner, Fabian | 2 |
| Aldabe, Itziar | 1 |
| Andreea Dutulescu | 1 |
| Andrew M. Olney | 1 |
| Araya, Roberto | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 33 |
| Journal Articles | 24 |
| Speeches/Meeting Papers | 6 |
| Numerical/Quantitative Data | 1 |
| Tests/Questionnaires | 1 |
Education Level
| Higher Education | 6 |
| Secondary Education | 6 |
| Postsecondary Education | 5 |
| Elementary Education | 3 |
| Junior High Schools | 3 |
| Middle Schools | 3 |
| Early Childhood Education | 1 |
| Grade 4 | 1 |
| Grade 5 | 1 |
| Grade 7 | 1 |
| Grade 8 | 1 |
| More ▼ | |
Audience
Location
| Germany | 3 |
| Africa | 1 |
| Alabama | 1 |
| Arizona | 1 |
| Arkansas | 1 |
| California | 1 |
| Canada | 1 |
| China | 1 |
| Connecticut | 1 |
| Georgia | 1 |
| Ghana | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 2 |
| Graduate Record Examinations | 1 |
| National Assessment of… | 1 |
| Remote Associates Test | 1 |
What Works Clearinghouse Rating
Po-Chun Huang; Ying-Hong Chan; Ching-Yu Yang; Hung-Yuan Chen; Yao-Chung Fan – IEEE Transactions on Learning Technologies, 2024
Question generation (QG) task plays a crucial role in adaptive learning. While significant QG performance advancements are reported, the existing QG studies are still far from practical usage. One point that needs strengthening is to consider the generation of question group, which remains untouched. For forming a question group, intrafactors…
Descriptors: Automation, Test Items, Computer Assisted Testing, Test Construction
A Method for Generating Course Test Questions Based on Natural Language Processing and Deep Learning
Hei-Chia Wang; Yu-Hung Chiang; I-Fan Chen – Education and Information Technologies, 2024
Assessment is viewed as an important means to understand learners' performance in the learning process. A good assessment method is based on high-quality examination questions. However, generating high-quality examination questions manually by teachers is a time-consuming task, and it is not easy for students to obtain question banks. To solve…
Descriptors: Natural Language Processing, Test Construction, Test Items, Models
Owen Henkel; Libby Hills; Bill Roberts; Joshua McGrane – International Journal of Artificial Intelligence in Education, 2025
Formative assessment plays a critical role in improving learning outcomes by providing feedback on student mastery. Open-ended questions, which require students to produce multi-word, nontrivial responses, are a popular tool for formative assessment as they provide more specific insights into what students do and do not know. However, grading…
Descriptors: Artificial Intelligence, Grading, Reading Comprehension, Natural Language Processing
Andreea Dutulescu; Stefan Ruseti; Denis Iorga; Mihai Dascalu; Danielle S. McNamara – Grantee Submission, 2025
Automated multiple-choice question (MCQ) generation is valuable for scalable assessment and enhanced learning experiences. How-ever, existing MCQ generation methods face challenges in ensuring plausible distractors and maintaining answer consistency. This paper intro-duces a method for MCQ generation that integrates reasoning-based explanations…
Descriptors: Automation, Computer Assisted Testing, Multiple Choice Tests, Natural Language Processing
Wesley Morris; Langdon Holmes; Joon Suh Choi; Scott Crossley – International Journal of Artificial Intelligence in Education, 2025
Recent developments in the field of artificial intelligence allow for improved performance in the automated assessment of extended response items in mathematics, potentially allowing for the scoring of these items cheaply and at scale. This study details the grand prize-winning approach to developing large language models (LLMs) to automatically…
Descriptors: Automation, Computer Assisted Testing, Mathematics Tests, Scoring
Olney, Andrew M. – Grantee Submission, 2022
Multi-angle question answering models have recently been proposed that promise to perform related tasks like question generation. However, performance on related tasks has not been thoroughly studied. We investigate a leading model called Macaw on the task of multiple choice question generation and evaluate its performance on three angles that…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Models
Lae Lae Shwe; Sureena Matayong; Suntorn Witosurapot – Education and Information Technologies, 2024
Multiple Choice Questions (MCQs) are an important evaluation technique for both examinations and learning activities. However, the manual creation of questions is time-consuming and challenging for teachers. Hence, there is a notable demand for an Automatic Question Generation (AQG) system. Several systems have been created for this aim, but the…
Descriptors: Difficulty Level, Computer Assisted Testing, Adaptive Testing, Multiple Choice Tests
Condor, Aubrey; Litster, Max; Pardos, Zachary – International Educational Data Mining Society, 2021
We explore how different components of an Automatic Short Answer Grading (ASAG) model affect the model's ability to generalize to questions outside of those used for training. For supervised automatic grading models, human ratings are primarily used as ground truth labels. Producing such ratings can be resource heavy, as subject matter experts…
Descriptors: Automation, Grading, Test Items, Generalization
Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024
Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Becker, Kirk A.; Kao, Shu-chuan – Journal of Applied Testing Technology, 2022
Natural Language Processing (NLP) offers methods for understanding and quantifying the similarity between written documents. Within the testing industry these methods have been used for automatic item generation, automated scoring of text and speech, modeling item characteristics, automatic question answering, machine translation, and automated…
Descriptors: Item Banks, Natural Language Processing, Computer Assisted Testing, Scoring
Andrew M. Olney – Grantee Submission, 2023
Multiple choice questions are traditionally expensive to produce. Recent advances in large language models (LLMs) have led to fine-tuned LLMs that generate questions competitive with human-authored questions. However, the relative capabilities of ChatGPT-family models have not yet been established for this task. We present a carefully-controlled…
Descriptors: Test Construction, Multiple Choice Tests, Test Items, Algorithms
Urrutia, Felipe; Araya, Roberto – Journal of Educational Computing Research, 2024
Written answers to open-ended questions can have a higher long-term effect on learning than multiple-choice questions. However, it is critical that teachers immediately review the answers, and ask to redo those that are incoherent. This can be a difficult task and can be time-consuming for teachers. A possible solution is to automate the detection…
Descriptors: Elementary School Students, Grade 4, Elementary School Mathematics, Mathematics Tests
Mead, Alan D.; Zhou, Chenxuan – Journal of Applied Testing Technology, 2022
This study fit a Naïve Bayesian classifier to the words of exam items to predict the Bloom's taxonomy level of the items. We addressed five research questions, showing that reasonably good prediction of Bloom's level was possible, but accuracy varies across levels. In our study, performance for Level 2 was poor (Level 2 items were misclassified…
Descriptors: Artificial Intelligence, Prediction, Taxonomy, Natural Language Processing
C. H., Dhawaleswar Rao; Saha, Sujan Kumar – IEEE Transactions on Learning Technologies, 2023
Multiple-choice question (MCQ) plays a significant role in educational assessment. Automatic MCQ generation has been an active research area for years, and many systems have been developed for MCQ generation. Still, we could not find any system that generates accurate MCQs from school-level textbook contents that are useful in real examinations.…
Descriptors: Multiple Choice Tests, Computer Assisted Testing, Automation, Test Items
Peter Organisciak; Selcuk Acar; Denis Dumas; Kelly Berthiaume – Grantee Submission, 2023
Automated scoring for divergent thinking (DT) seeks to overcome a key obstacle to creativity measurement: the effort, cost, and reliability of scoring open-ended tests. For a common test of DT, the Alternate Uses Task (AUT), the primary automated approach casts the problem as a semantic distance between a prompt and the resulting idea in a text…
Descriptors: Automation, Computer Assisted Testing, Scoring, Creative Thinking

Peer reviewed
Direct link
