Publication Date
| In 2026 | 0 |
| Since 2025 | 7 |
| Since 2022 (last 5 years) | 44 |
| Since 2017 (last 10 years) | 109 |
| Since 2007 (last 20 years) | 273 |
Descriptor
| Comparative Analysis | 670 |
| Test Construction | 670 |
| Test Items | 178 |
| Foreign Countries | 170 |
| Test Validity | 167 |
| Test Reliability | 132 |
| Higher Education | 96 |
| Item Analysis | 86 |
| Language Tests | 85 |
| Scores | 80 |
| Computer Assisted Testing | 72 |
| More ▼ | |
Source
Author
| Benson, Jeri | 4 |
| Hambleton, Ronald K. | 4 |
| Linn, Robert L. | 4 |
| Sireci, Stephen G. | 4 |
| Weiss, David J. | 4 |
| Ebel, Robert L. | 3 |
| Haladyna, Tom | 3 |
| Subkoviak, Michael J. | 3 |
| Wu, Margaret | 3 |
| Beaton, Albert E. | 2 |
| Bogner, Franz X. | 2 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 15 |
| Teachers | 13 |
| Researchers | 10 |
| Administrators | 7 |
| Counselors | 2 |
| Policymakers | 2 |
| Media Staff | 1 |
| Parents | 1 |
| Support Staff | 1 |
Location
| United States | 21 |
| Australia | 17 |
| Japan | 12 |
| China | 9 |
| Germany | 9 |
| Indonesia | 9 |
| Taiwan | 8 |
| California | 7 |
| France | 7 |
| Israel | 7 |
| Spain | 7 |
| More ▼ | |
Laws, Policies, & Programs
| No Child Left Behind Act 2001 | 2 |
| Race to the Top | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Kaja Haugen; Cecilie Hamnes Carlsen; Christine Möller-Omrani – Language Awareness, 2025
This article presents the process of constructing and validating a test of metalinguistic awareness (MLA) for young school children (age 8-10). The test was developed between 2021 and 2023 as part of the MetaLearn research project, financed by The Research Council of Norway. The research team defines MLA as using metalinguistic knowledge at a…
Descriptors: Language Tests, Test Construction, Elementary School Students, Metalinguistics
Moses, Tim – Journal of Educational Measurement, 2022
One result of recent changes in testing is that previously established linking frameworks may not adequately address challenges in current linking situations. Test linking through equating, concordance, vertical scaling or battery scaling may not represent linkings for the scores of tests developed to measure constructs differently for different…
Descriptors: Measures (Individuals), Educational Assessment, Test Construction, Comparative Analysis
Peter Moss; Mathias Urban – Contemporary Issues in Early Childhood, 2024
This colloquium brings information about a second cycle of OECD's International Early Learning and Well-being Study (IELS) to the early childhood community, and offers a further critique of the approach to comparative research that the IELS embodies.
Descriptors: Test Construction, Early Childhood Education, Educational Research, Research Problems
Eray Selçuk; Ergül Demir – International Journal of Assessment Tools in Education, 2024
This research aims to compare the ability and item parameter estimations of Item Response Theory according to Maximum likelihood and Bayesian approaches in different Monte Carlo simulation conditions. For this purpose, depending on the changes in the priori distribution type, sample size, test length, and logistics model, the ability and item…
Descriptors: Item Response Theory, Item Analysis, Test Items, Simulation
Jeff Allen; Jay Thomas; Stacy Dreyer; Scott Johanningmeier; Dana Murano; Ty Cruce; Xin Li; Edgar Sanchez – ACT Education Corp., 2025
This report describes the process of developing and validating the enhanced ACT. The report describes the changes made to the test content and the processes by which these design decisions were implemented. The authors describe how they shared the overall scope of the enhancements, including the initial blueprints, with external expert panels,…
Descriptors: College Entrance Examinations, Testing, Change, Test Construction
Kate E. Walton; Cristina Anguiano-Carrasco – ACT, Inc., 2024
Large language models (LLMs), such as ChatGPT, are becoming increasingly prominent. Their use is becoming more and more popular to assist with simple tasks, such as summarizing documents, translating languages, rephrasing sentences, or answering questions. Reports like McKinsey's (Chui, & Yee, 2023) estimate that by implementing LLMs,…
Descriptors: Artificial Intelligence, Man Machine Systems, Natural Language Processing, Test Construction
Shaw, Stuart; Crisp, Victoria; Hughes, Sarah – Research Matters, 2020
The credibility of an Awarding Organisation's products is partly reliant upon the claims it makes about its assessments and on the evidence it can provide to support such claims. Some such claims relate to comparability. For example, for syllabuses with options, such as the choice to conduct coursework or to take an alternative exam testing…
Descriptors: Alternative Assessment, Comparative Analysis, Standards, Evaluation Criteria
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Maïano, Christophe; Morin, Alexandre J. S.; Tietjens, Maike; Bastos, Tânia; Luiggi, Maxime; Corredeira, Rui; Griffet, Jean; Sánchez-Oliva, David – Measurement in Physical Education and Exercise Science, 2023
The present study sought to examine the psychometric properties of new German, Portuguese, and Spanish versions of the Revised Short Form of the Physical Self-Inventory (PSI-S-"R"), and to contrast these properties against those from the original French version of this instrument. Participants (n = 1802) were 288 French youth, 177 German…
Descriptors: German, Portuguese, Spanish, Test Construction
Christophe O. Soulage; Fabien Van Coppenolle; Fitsum Guebre-Egziabher – Advances in Physiology Education, 2024
Artificial intelligence (AI) has gained massive interest with the public release of the conversational AI "ChatGPT," but it also has become a matter of concern for academia as it can easily be misused. We performed a quantitative evaluation of the performance of ChatGPT on a medical physiology university examination. Forty-one answers…
Descriptors: Medical Students, Medical Education, Artificial Intelligence, Computer Software
Clemens, Nathan H.; Fuchs, Douglas – Reading Research Quarterly, 2022
Many seem to believe that researcher-made tests are unnecessary, if not inappropriate, for evaluating reading comprehension interventions. We suggest that this view reflects a zeitgeist in which researcher-made (proximal) tests that align with the researchers' interventions are closely scrutinized and often devalued, whereas commercially developed…
Descriptors: Reading Tests, Reading Comprehension, Intervention, Test Construction
Karine Molvinger – Journal of Chemical Education, 2024
This article focuses on the learning of the Lewis representation at the transition from high school to university, in France. Indeed, this notion is taught both in high school and in the first year of higher education but with different methods, which seems to hinder the learners. In this work, we observe 11th grade and higher education classes…
Descriptors: Secondary Education, Higher Education, Secondary School Science, College Science
Erdem-Kara, Basak; Dogan, Nuri – International Journal of Assessment Tools in Education, 2022
Recently, adaptive test approaches have become a viable alternative to traditional fixed-item tests. The main advantage of adaptive tests is that they reach desired measurement precision with fewer items. However, fewer items mean that each item has a more significant effect on ability estimation and therefore those tests are open to more…
Descriptors: Item Analysis, Computer Assisted Testing, Test Items, Test Construction
Kyung-Mi O. – Language Testing in Asia, 2024
This study examines the efficacy of artificial intelligence (AI) in creating parallel test items compared to human-made ones. Two test forms were developed: one consisting of 20 existing human-made items and another with 20 new items generated with ChatGPT assistance. Expert reviews confirmed the content parallelism of the two test forms.…
Descriptors: Comparative Analysis, Artificial Intelligence, Computer Software, Test Items
Roger Young; Emily Courtney; Alexander Kah; Mariah Wilkerson; Yi-Hsin Chen – Teaching of Psychology, 2025
Background: Multiple-choice item (MCI) assessments are burdensome for instructors to develop. Artificial intelligence (AI, e.g., ChatGPT) can streamline the process without sacrificing quality. The quality of AI-generated MCIs and human experts is comparable. However, whether the quality of AI-generated MCIs is equally good across various domain-…
Descriptors: Item Response Theory, Multiple Choice Tests, Psychology, Textbooks

Peer reviewed
Direct link
