NotesFAQContact Us
Collection
Advanced
Search Tips
Audience
Practitioners1
Laws, Policies, & Programs
Assessments and Surveys
National Assessment of…1
What Works Clearinghouse Rating
Showing all 15 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Bin Tan; Nour Armoush; Elisabetta Mazzullo; Okan Bulut; Mark J. Gierl – International Journal of Assessment Tools in Education, 2025
This study reviews existing research on the use of large language models (LLMs) for automatic item generation (AIG). We performed a comprehensive literature search across seven research databases, selected studies based on predefined criteria, and summarized 60 relevant studies that employed LLMs in the AIG process. We identified the most commonly…
Descriptors: Artificial Intelligence, Test Items, Automation, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Filipe Manuel Vidal Falcão; Daniela S.M. Pereira; José Miguel Pêgo; Patrício Costa – Education and Information Technologies, 2024
Progress tests (PT) are a popular type of longitudinal assessment used for evaluating clinical knowledge retention and long-life learning in health professions education. Most PTs consist of multiple-choice questions (MCQs) whose development is costly and time-consuming. Automatic Item Generation (AIG) generates test items through algorithms,…
Descriptors: Automation, Test Items, Progress Monitoring, Medical Education
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Ayfer Sayin; Sabiha Bozdag; Mark J. Gierl – International Journal of Assessment Tools in Education, 2023
The purpose of this study is to generate non-verbal items for a visual reasoning test using templated-based automatic item generation (AIG). The fundamental research method involved following the three stages of template-based AIG. An item from the 2016 4th-grade entrance exam of the Science and Art Center (known as BILSEM) was chosen as the…
Descriptors: Test Items, Test Format, Nonverbal Tests, Visual Measures
Peer reviewed Peer reviewed
Direct linkDirect link
Shin, Jinnie; Gierl, Mark J. – International Journal of Testing, 2022
Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using…
Descriptors: Reading Comprehension, Test Construction, Test Items, Natural Language Processing
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018
The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…
Descriptors: Programming, Automation, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Ou Lydia; Rios, Joseph A.; Heilman, Michael; Gerard, Libby; Linn, Marcia C. – Journal of Research in Science Teaching, 2016
Constructed response items can both measure the coherence of student ideas and serve as reflective experiences to strengthen instruction. We report on new automated scoring technologies that can reduce the cost and complexity of scoring constructed-response items. This study explored the accuracy of c-rater-ML, an automated scoring engine…
Descriptors: Science Tests, Scoring, Automation, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Papasalouros, Andreas; Chatzigiannakou, Maria – International Association for Development of the Information Society, 2018
Automating the production of questions for assessment and self-assessment has become recently an active field of study. The use of Semantic Web technologies has certain advantages over other methods for question generation and thus is one of the most important lines of research for this problem. The aim of this paper is to provide an overview of…
Descriptors: Computer Assisted Testing, Web 2.0 Technologies, Test Format, Multiple Choice Tests
Carter, Kelli Patrice – ProQuest LLC, 2019
There has been a call from the national community of biologists and biology educators to increase biological literacy of undergraduate students, including understanding and application of core concepts. The structure and function relationship is a core concept identified by the wider biology community and by physiology faculty. Understanding of…
Descriptors: Concept Formation, Comprehension, Scientific Concepts, Formative Evaluation
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J.; Diao, Qi – Journal of Educational Measurement, 2011
In automated test assembly (ATA), the methodology of mixed-integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different…
Descriptors: Test Items, Test Format, Test Construction, Item Banks
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Graf, Edith Aurora – ETS Research Report Series, 2008
Quantitative item models are item structures that may be expressed in terms of mathematical variables and constraints. An item model may be developed as a computer program from which large numbers of items are automatically generated. Item models can be used to produce large numbers of items for use in traditional, large-scale assessments. But…
Descriptors: Test Items, Models, Diagnostic Tests, Statistical Analysis
Martinez, Michael E.; And Others – 1990
Large-scale testing is dominated by the multiple-choice question format. Widespread use of the format is due, in part, to the ease with which multiple-choice items can be scored automatically. This paper examines automatic scoring procedures for an alternative item type: figural response. Figural response items call for the completion or…
Descriptors: Automation, Computer Assisted Testing, Educational Technology, Multiple Choice Tests
Peer reviewed Peer reviewed
Martinez, Michael E.; Bennett, Randy Elliot – Applied Measurement in Education, 1992
New developments in the use of automatically scorable constructed response item types for large-scale assessment are reviewed for five domains: (1) mathematical reasoning; (2) algebra problem solving; (3) computer science; (4) architecture; and (5) natural language. Ways in which these technologies are likely to shape testing are considered. (SLD)
Descriptors: Algebra, Architecture, Automation, Computer Science
Peer reviewed Peer reviewed
Direct linkDirect link
Embretson, Susan E. – Measurement: Interdisciplinary Research and Perspectives, 2004
The last century was marked by dazzling changes in many areas, such as technology and communications. Predictions into the second century of testing are seemingly difficult in such a context. Yet, looking back to the turn of the last century, Kirkpatrick (1900), in his American Psychological Association presidential address, presented fundamental…
Descriptors: Ability, Testing, Futures (of Society), Psychometrics
Leuba, Richard J. – Engineering Education, 1986
Promotes the use of machine-scored tests in basic engineering science classes. Discusses some principles and practices of machine-scored testing. Provides several example test items. Argues that such tests can be used to enhance basic understanding of concepts and problem solving skills. (TW)
Descriptors: Automation, College Science, Engineering Education, Evaluation Methods