Publication Date
| In 2026 | 0 |
| Since 2025 | 6 |
| Since 2022 (last 5 years) | 25 |
| Since 2017 (last 10 years) | 58 |
| Since 2007 (last 20 years) | 77 |
Descriptor
| Accuracy | 77 |
| Test Format | 77 |
| Test Items | 35 |
| Computer Assisted Testing | 23 |
| Foreign Countries | 23 |
| Item Response Theory | 23 |
| Comparative Analysis | 19 |
| Classification | 15 |
| Language Tests | 15 |
| Item Analysis | 13 |
| Multiple Choice Tests | 12 |
| More ▼ | |
Source
Author
| Lee, Won-Chan | 3 |
| Hambleton, Ronald K. | 2 |
| Kalender, Ilker | 2 |
| Kim, Sooyeon | 2 |
| Kim, Stella Y. | 2 |
| Agus, Mirian | 1 |
| Ahmadi, Alireza | 1 |
| Aizawa, Kazumi | 1 |
| Akbar, Maruf | 1 |
| Alamri, Aeshah | 1 |
| Alasgarova, Gunel A. | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
Location
| Turkey | 3 |
| Indonesia | 2 |
| Iran | 2 |
| Asia | 1 |
| Australia | 1 |
| Austria | 1 |
| Azerbaijan | 1 |
| China | 1 |
| Florida | 1 |
| Germany | 1 |
| Italy (Milan) | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 2 |
| Test of English as a Foreign… | 2 |
| Trends in International… | 2 |
| Advanced Placement… | 1 |
| National Assessment Program… | 1 |
| Test of English for… | 1 |
| Torrance Tests of Creative… | 1 |
What Works Clearinghouse Rating
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Uk Hyun Cho – ProQuest LLC, 2024
The present study investigates the influence of multidimensionality on linking and equating in a unidimensional IRT. Two hypothetical multidimensional scenarios are explored under a nonequivalent group common-item equating design. The first scenario examines test forms designed to measure multiple constructs, while the second scenario examines a…
Descriptors: Item Response Theory, Classification, Correlation, Test Format
Harpreet Auby; Namrata Shivagunde; Vijeta Deshpande; Anna Rumshisky; Milo D. Koretsky – Journal of Engineering Education, 2025
Background: Analyzing student short-answer written justifications to conceptually challenging questions has proven helpful to understand student thinking and improve conceptual understanding. However, qualitative analyses are limited by the burden of analyzing large amounts of text. Purpose: We apply dense and sparse Large Language Models (LLMs)…
Descriptors: Student Evaluation, Thinking Skills, Test Format, Cognitive Processes
Selcuk Acar; Peter Organisciak; Denis Dumas – Journal of Creative Behavior, 2025
In this three-study investigation, we applied various approaches to score drawings created in response to both Form A and Form B of the Torrance Tests of Creative Thinking-Figural (broadly TTCT-F) as well as the Multi-Trial Creative Ideation task (MTCI). We focused on TTCT-F in Study 1, and utilizing a random forest classifier, we achieved 79% and…
Descriptors: Scoring, Computer Assisted Testing, Models, Correlation
Ting Sun; Stella Yun Kim – Educational and Psychological Measurement, 2024
Equating is a statistical procedure used to adjust for the difference in form difficulty such that scores on those forms can be used and interpreted comparably. In practice, however, equating methods are often implemented without considering the extent to which two forms differ in difficulty. The study aims to examine the effect of the magnitude…
Descriptors: Difficulty Level, Data Interpretation, Equated Scores, High School Students
Hasibe Yahsi Sari; Hulya Kelecioglu – International Journal of Assessment Tools in Education, 2025
The aim of the study is to examine the effect of polytomous item ratio on ability estimation in different conditions in multistage tests (MST) using mixed tests. The study is simulation-based research. In the PISA 2018 application, the ability parameters of the individuals and the item pool were created by using the item parameters estimated from…
Descriptors: Test Items, Test Format, Accuracy, Test Length
Jing Ma – ProQuest LLC, 2024
This study investigated the impact of scoring polytomous items later on measurement precision, classification accuracy, and test security in mixed-format adaptive testing. Utilizing the shadow test approach, a simulation study was conducted across various test designs, lengths, number and location of polytomous item. Results showed that while…
Descriptors: Scoring, Adaptive Testing, Test Items, Classification
Anna Filighera; Sebastian Ochs; Tim Steuer; Thomas Tregel – International Journal of Artificial Intelligence in Education, 2024
Automatic grading models are valued for the time and effort saved during the instruction of large student bodies. Especially with the increasing digitization of education and interest in large-scale standardized testing, the popularity of automatic grading has risen to the point where commercial solutions are widely available and used. However,…
Descriptors: Cheating, Grading, Form Classes (Languages), Computer Software
Yang Du; Susu Zhang – Journal of Educational and Behavioral Statistics, 2025
Item compromise has long posed challenges in educational measurement, jeopardizing both test validity and test security of continuous tests. Detecting compromised items is therefore crucial to address this concern. The present literature on compromised item detection reveals two notable gaps: First, the majority of existing methods are based upon…
Descriptors: Item Response Theory, Item Analysis, Bayesian Statistics, Educational Assessment
Dambha, Tasneem; Swanepoel, De Wet; Mahomed-Asmail, Faheema; De Sousa, Karina C.; Graham, Marien A.; Smits, Cas – Journal of Speech, Language, and Hearing Research, 2022
Purpose: This study compared the test characteristics, test-retest reliability, and test efficiency of three novel digits-in-noise (DIN) test procedures to a conventional antiphasic 23-trial adaptive DIN (D23). Method: One hundred twenty participants with an average age of 42 years (SD = 19) were included. Participants were tested and retested…
Descriptors: Auditory Tests, Screening Tests, Efficiency, Test Format
Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024
To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…
Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement
Cerullo, Enzo; Jones, Hayley E.; Carter, Olivia; Quinn, Terry J.; Cooper, Nicola J.; Sutton, Alex J. – Research Synthesis Methods, 2022
Standard methods for the meta-analysis of medical tests, without assuming a gold standard, are limited to dichotomous data. Multivariate probit models are used to analyse correlated dichotomous data, and can be extended to model ordinal data. Within the context of an imperfect gold standard, they have previously been used for the analysis of…
Descriptors: Meta Analysis, Test Format, Medicine, Standards
Ozge Ersan Cinar – ProQuest LLC, 2022
In educational tests, a group of questions related to a shared stimulus is called a testlet (e.g., a reading passage with multiple related questions). Use of testlets is very common in educational tests. Additionally, computerized adaptive testing (CAT) is a mode of testing where the test forms are created in real time tailoring to the test…
Descriptors: Test Items, Computer Assisted Testing, Adaptive Testing, Educational Testing
McGuire, Michael J. – International Journal for the Scholarship of Teaching and Learning, 2023
College students in a lower-division psychology course made metacognitive judgments by predicting and postdicting performance for true-false, multiple-choice, and fill-in-the-blank question sets on each of three exams. This study investigated which question format would result in the most accurate metacognitive judgments. Extending Koriat's (1997)…
Descriptors: Metacognition, Multiple Choice Tests, Accuracy, Test Format
Lu, Ru; Kim, Sooyeon – ETS Research Report Series, 2021
This study evaluated the impact of subgroup weighting for equating through a common-item anchor. We used data from a single test form to create two research forms for which the equating relationship was known. The results showed that equating was most accurate when the new form and reference form samples were weighted to be similar to the target…
Descriptors: Equated Scores, Weighted Scores, Raw Scores, Test Items

Peer reviewed
Direct link
