Publication Date
| In 2026 | 0 |
| Since 2025 | 2 |
| Since 2022 (last 5 years) | 13 |
| Since 2017 (last 10 years) | 18 |
| Since 2007 (last 20 years) | 26 |
Descriptor
| Automation | 32 |
| Test Format | 32 |
| Test Items | 15 |
| Computer Assisted Testing | 14 |
| Scoring | 11 |
| Test Construction | 8 |
| Artificial Intelligence | 7 |
| Grading | 5 |
| Models | 5 |
| Comparative Analysis | 4 |
| Educational Technology | 4 |
| More ▼ | |
Source
Author
| van der Linden, Wim J. | 3 |
| Diao, Qi | 2 |
| Mark J. Gierl | 2 |
| Martinez, Michael E. | 2 |
| Anna Rumshisky | 1 |
| Ayfer Sayin | 1 |
| Bennett, Randy Elliot | 1 |
| Bhowmick, Plaban Kumar | 1 |
| Bin Tan | 1 |
| Boyer, Michelle | 1 |
| Brian E. Clauser | 1 |
| More ▼ | |
Publication Type
Education Level
| Higher Education | 7 |
| Postsecondary Education | 6 |
| Elementary Education | 2 |
| Grade 12 | 1 |
| Grade 4 | 1 |
| High Schools | 1 |
| Intermediate Grades | 1 |
| Secondary Education | 1 |
| Two Year Colleges | 1 |
Audience
| Practitioners | 2 |
| Researchers | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| National Assessment of… | 1 |
What Works Clearinghouse Rating
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Harpreet Auby; Namrata Shivagunde; Vijeta Deshpande; Anna Rumshisky; Milo D. Koretsky – Journal of Engineering Education, 2025
Background: Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text. Purpose: We apply dense and sparse Large Language Models (LLMs)…
Descriptors: Student Evaluation, Thinking Skills, Test Format, Cognitive Processes
Brian E. Clauser; Victoria Yaneva; Peter Baldwin; Le An Ha; Janet Mee – Applied Measurement in Education, 2024
Multiple-choice questions have become ubiquitous in educational measurement because the format allows for efficient and accurate scoring. Nonetheless, there remains continued interest in constructed-response formats. This interest has driven efforts to develop computer-based scoring procedures that can accurately and efficiently score these items.…
Descriptors: Computer Uses in Education, Artificial Intelligence, Scoring, Responses
Zesch, Torsten; Horbach, Andrea; Zehner, Fabian – Educational Measurement: Issues and Practice, 2023
In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced…
Descriptors: Influences, Academic Achievement, Feasibility Studies, Automation
Ulrike Padó; Yunus Eryilmaz; Larissa Kirschner – International Journal of Artificial Intelligence in Education, 2024
Short-Answer Grading (SAG) is a time-consuming task for teachers that automated SAG models have long promised to make easier. However, there are three challenges for their broad-scale adoption: A technical challenge regarding the need for high-quality models, which is exacerbated for languages with fewer resources than English; a usability…
Descriptors: Grading, Automation, Test Format, Computer Assisted Testing
Filip Moons; Paola Iannone; Ellen Vandervieren – ZDM: Mathematics Education, 2024
Handwritten tasks are better suited than digital ones to assess higher-order mathematics skills, as students can express themselves more freely. However, maintaining reliability and providing feedback can be challenging when assessing high-stakes, handwritten mathematics exams involving multiple assessors. This paper discusses a new semi-automated…
Descriptors: Grading, Mathematics Tests, Handwriting, Test Format
McCaffrey, Daniel F.; Casabianca, Jodi M.; Ricker-Pedley, Kathryn L.; Lawless, René R.; Wendler, Cathy – ETS Research Report Series, 2022
This document describes a set of best practices for developing, implementing, and maintaining the critical process of scoring constructed-response tasks. These practices address both the use of human raters and automated scoring systems as part of the scoring process and cover the scoring of written, spoken, performance, or multimodal responses.…
Descriptors: Best Practices, Scoring, Test Format, Computer Assisted Testing
Han, Chao – Language Testing, 2022
Over the past decade, testing and assessing spoken-language interpreting has garnered an increasing amount of attention from stakeholders in interpreter education, professional certification, and interpreting research. This is because in these fields assessment results provide a critical evidential basis for high-stakes decisions, such as the…
Descriptors: Translation, Language Tests, Testing, Evaluation Methods
Filipe Manuel Vidal Falcão; Daniela S.M. Pereira; José Miguel Pêgo; Patrício Costa – Education and Information Technologies, 2024
Progress tests (PT) are a popular type of longitudinal assessment used for evaluating clinical knowledge retention and long-life learning in health professions education. Most PTs consist of multiple-choice questions (MCQs) whose development is costly and time-consuming. Automatic Item Generation (AIG) generates test items through algorithms,…
Descriptors: Automation, Test Items, Progress Monitoring, Medical Education
Ivan D. Mardini G.; Christian G. Quintero M.; César A. Viloria N.; Winston S. Percybrooks B.; Heydy S. Robles N.; Karen Villalba R. – Education and Information Technologies, 2024
Today reading comprehension is considered an essential skill in modern life, therefore, higher education students require more specific skills to understand, interpret and evaluate texts effectively. Short answer questions (SAQs) are one of the relevant and proper tools for assessing reading comprehension skills. Unlike multiple-choice questions,…
Descriptors: Reading Comprehension, Reading Tests, Learning Strategies, Grading
Sahu, Archana; Bhowmick, Plaban Kumar – IEEE Transactions on Learning Technologies, 2020
In this paper, we studied different automatic short answer grading (ASAG) systems to provide a comprehensive view of the feature spaces explored by previous works. While the performance reported in previous works have been encouraging, systematic study of the features is lacking. Apart from providing systematic feature space exploration, we also…
Descriptors: Automation, Grading, Test Format, Artificial Intelligence
Paiva, José Carlos; Leal, José Paulo; Figueira, Álvaro – ACM Transactions on Computing Education, 2022
Practical programming competencies are critical to the success in computer science (CS) education and go-to-market of fresh graduates. Acquiring the required level of skills is a long journey of discovery, trial and error, and optimization seeking through a broad range of programming activities that learners must perform themselves. It is not…
Descriptors: Automation, Computer Assisted Testing, Student Evaluation, Computer Science Education
Ayfer Sayin; Sabiha Bozdag; Mark J. Gierl – International Journal of Assessment Tools in Education, 2023
The purpose of this study is to generate non-verbal items for a visual reasoning test using templated-based automatic item generation (AIG). The fundamental research method involved following the three stages of template-based AIG. An item from the 2016 4th-grade entrance exam of the Science and Art Center (known as BILSEM) was chosen as the…
Descriptors: Test Items, Test Format, Nonverbal Tests, Visual Measures
Shin, Jinnie; Gierl, Mark J. – International Journal of Testing, 2022
Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using…
Descriptors: Reading Comprehension, Test Construction, Test Items, Natural Language Processing
Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018
The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…
Descriptors: Programming, Automation, Test Items, Test Format

Peer reviewed
Direct link
