ERIC - Search Results

Publication Date

In 2026	0
Since 2025	1
Since 2022 (last 5 years)	4
Since 2017 (last 10 years)	8
Since 2007 (last 20 years)	11

Descriptor

Automation	15
Test Format	15
Test Items	15
Test Construction	7
Computer Assisted Testing	6
Scoring	5
Educational Technology	3
Foreign Countries	3
Multiple Choice Tests	3
Science Tests	3
Test Validity	3
Artificial Intelligence	2
College Science	2
Comparative Analysis	2
Differences	2
Natural Language Processing	2
Problem Solving	2
Programming	2
Responses	2
Statistical Analysis	2
Test Reliability	2
Test Scoring Machines	2
Testing	2
Thinking Skills	2
Ability	1
More ▼

Source

Applied Measurement in…	2
International Journal of…	2
Journal of Educational…	2
ETS Research Report Series	1
Education and Information…	1
Engineering Education	1
International Association for…	1
International Journal of…	1
Journal of Research in…	1
Measurement:…	1
ProQuest LLC	1
More ▼

Publication Type

Journal Articles	12
Reports - Research	6
Reports - Descriptive	5
Information Analyses	2
Dissertations/Theses -…	1
Opinion Papers	1
Reports - Evaluative	1
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	3
Postsecondary Education	3
Elementary Education	1
Grade 4	1
Intermediate Grades	1
Two Year Colleges	1

Audience

Practitioners

Location

Portugal	1
South Korea	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 15 results Save | Export

A Review of Automatic Item Generation Techniques Leveraging Large Language Models

Peer reviewed
PDF on ERIC

Download full text

Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025

This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…

Descriptors: Artificial Intelligence, Test Items, Automation, Test Format

Progress Is Impossible without Change: Implementing Automatic Item Generation in Medical Knowledge Progress Testing

Peer reviewed

Direct link

Filipe Manuel Vidal Falcão; Daniela S.M. Pereira; José Miguel Pêgo; Patrício Costa – Education and Information Technologies, 2024

Progress tests (PT) are a popular type of longitudinal assessment used for evaluating clinical knowledge retention and long-life learning in health professions education. Most PTs consist of multiple-choice questions (MCQs) whose development is costly and time-consuming. Automatic Item Generation (AIG) generates test items through algorithms,…

Descriptors: Automation, Test Items, Progress Monitoring, Medical Education

Automatic Item Generation for Non-Verbal Reasoning Items

Peer reviewed
PDF on ERIC

Download full text

Ayfer Sayin; Sabiha Bozdag; Mark J. Gierl – International Journal of Assessment Tools in Education, 2023

The purpose of this study is to generate non-verbal items for a visual reasoning test using templated-based automatic item generation (AIG). The fundamental research method involved following the three stages of template-based AIG. An item from the 2016 4th-grade entrance exam of the Science and Art Center (known as BILSEM) was chosen as the…

Descriptors: Test Items, Test Format, Nonverbal Tests, Visual Measures

Generating Reading Comprehension Items Using Automated Processes

Peer reviewed

Direct link

Shin, Jinnie; Gierl, Mark J. – International Journal of Testing, 2022

Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using…

Descriptors: Reading Comprehension, Test Construction, Test Items, Natural Language Processing

A Comparison of Constraint Programming and Mixed-Integer Programming for Automated Test-Form Generation

Peer reviewed

Direct link

Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018

The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…

Descriptors: Programming, Automation, Test Items, Test Format

Validation of Automated Scoring of Science Assessments

Peer reviewed

Direct link

Liu, Ou Lydia; Rios, Joseph A.; Heilman, Michael; Gerard, Libby; Linn, Marcia C. – Journal of Research in Science Teaching, 2016

Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of c-rater-ML, an automated scoring engine…

Descriptors: Science Tests, Scoring, Automation, Validity

Statistically Comparing the Performance of Multiple Automated Raters across Multiple Items

Peer reviewed

Direct link

Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017

Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…

Descriptors: Automation, Scoring, Comparative Analysis, Test Items

Semantic Web and Question Generation: An Overview of the State of the Art

Peer reviewed
PDF on ERIC

Download full text

Papasalouros, Andreas; Chatzigiannakou, Maria – International Association for Development of the Information Society, 2018

Automating the production of questions for assessment and self-assessment has become recently an active field of study. The use of Semantic Web technologies has certain advantages over other methods for question generation and thus is one of the most important lines of research for this problem. The aim of this paper is to provide an overview of…

Descriptors: Computer Assisted Testing, Web 2.0 Technologies, Test Format, Multiple Choice Tests

Investigating Student Conceptual Understanding of Structure and Function by Using Formative Assessment and Automated Scoring Models

Direct link

Carter, Kelli Patrice – ProQuest LLC, 2019

There has been a call from the national community of biologists and biology educators to increase biological literacy of undergraduate students, including understanding and application of core concepts. The structure and function relationship is a core concept identified by the wider biology community and by physiology faculty. Understanding of…

Descriptors: Concept Formation, Comprehension, Scientific Concepts, Formative Evaluation

Automated Test-Form Generation

Peer reviewed

Direct link

van der Linden, Wim J.; Diao, Qi – Journal of Educational Measurement, 2011

In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…

Descriptors: Test Items, Test Format, Test Construction, Item Banks

Approaches to the Design of Diagnostic Item Models. Research Report. ETS RR-08-07

Peer reviewed
PDF on ERIC

Download full text

Graf, Edith Aurora – ETS Research Report Series, 2008

Quantitative item models are item structures that may be expressed in terms of mathematical variables and constraints. An item model may be developed as a computer program from which large numbers of items are automatically generated. Item models can be used to produce large numbers of items for use in traditional, large-scale assessments. But…

Descriptors: Test Items, Models, Diagnostic Tests, Statistical Analysis

Automatic Scoring of Paper-and-Pencil Figural Responses. Research Report.

Download full text

Martinez, Michael E.; And Others – 1990

Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…

Descriptors: Automation, Computer Assisted Testing, Educational Technology, Multiple Choice Tests

A Review of Automatically Scorable Constructed-Response Item Types for Large-Scale Assessment.

Peer reviewed

Martinez, Michael E.; Bennett, Randy Elliot – Applied Measurement in Education, 1992

New developments in the use of automatically scorable constructed response item types for large-scale assessment are reviewed for five domains: (1) mathematical reasoning; (2) algebra problem solving; (3) computer science; (4) architecture; and (5) natural language. Ways in which these technologies are likely to shape testing are considered. (SLD)

Descriptors: Algebra, Architecture, Automation, Computer Science

The Second Century of Ability Testing: Some Predictions and Speculations

Peer reviewed

Direct link

Embretson, Susan E. – Measurement: Interdisciplinary Research and Perspectives, 2004

The last century was marked by dazzling changes in many areas, such as technology and communications. Predictions into the second century of testing are seemingly difficult in such a context. Yet, looking back to the turn of the last century, Kirkpatrick (1900), in his American Psychological Association presidential address, presented fundamental…

Descriptors: Ability, Testing, Futures (of Society), Psychometrics

Machine-Scored Testing, Part I: Purposes, Principles, and Practices.

Leuba, Richard J. – Engineering Education, 1986

Promotes the use of machine-scored tests in basic engineering science classes. Discusses some principles and practices of machine-scored testing. Provides several example test items. Argues that such tests can be used to enhance basic understanding of concepts and problem solving skills. (TW)

Descriptors: Automation, College Science, Engineering Education, Evaluation Methods

Mark J. Gierl	2
Martinez, Michael E.	2
van der Linden, Wim J.	2
Ayfer Sayin	1
Bennett, Randy Elliot	1
Bin Tan	1
Boyer, Michelle	1
Carter, Kelli Patrice	1
Chatzigiannakou, Maria	1
Daniela S.M. Pereira	1
Diao, Qi	1
Elisabetta Mazzullo	1
Embretson, Susan E.	1
Filipe Manuel Vidal Falcão	1
Gerard, Libby	1
Gierl, Mark J.	1
Graf, Edith Aurora	1
Heilman, Michael	1
José Miguel Pêgo	1
Kieftenbeld, Vincent	1
Leuba, Richard J.	1
Li, Jie	1
Linn, Marcia C.	1
Liu, Ou Lydia	1
More ▼