Publication Date
| In 2026 | 0 |
| Since 2025 | 3 |
| Since 2022 (last 5 years) | 5 |
| Since 2017 (last 10 years) | 45 |
| Since 2007 (last 20 years) | 128 |
Descriptor
| Statistical Analysis | 128 |
| Test Format | 128 |
| Foreign Countries | 58 |
| Test Items | 53 |
| Comparative Analysis | 50 |
| Scores | 33 |
| Multiple Choice Tests | 31 |
| Correlation | 28 |
| Item Response Theory | 25 |
| College Students | 21 |
| Test Reliability | 21 |
| More ▼ | |
Source
Author
| Aizawa, Kazumi | 2 |
| Ali, Usama S. | 2 |
| Bande, Rhodora A. | 2 |
| Bendulo, Hermabeth O. | 2 |
| Iso, Tatsuo | 2 |
| Lee, Yi-Hsuan | 2 |
| Macalinao, Myrna L. | 2 |
| Menold, Natalja | 2 |
| Oyzon, Voltaire Q. | 2 |
| Tibus, Erlinda D. | 2 |
| Abramzon, Andrea | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 117 |
| Journal Articles | 107 |
| Tests/Questionnaires | 10 |
| Speeches/Meeting Papers | 8 |
| Dissertations/Theses -… | 7 |
| Reports - Evaluative | 4 |
| Numerical/Quantitative Data | 2 |
| Information Analyses | 1 |
Education Level
Audience
Location
| Turkey | 7 |
| Germany | 6 |
| Japan | 5 |
| Australia | 4 |
| Iran | 4 |
| Philippines | 3 |
| Sweden | 3 |
| China | 2 |
| Czech Republic | 2 |
| Florida | 2 |
| Netherlands | 2 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Tom Benton – Practical Assessment, Research & Evaluation, 2025
This paper proposes an extension of linear equating that may be useful in one of two fairly common assessment scenarios. One is where different students have taken different combinations of test forms. This might occur, for example, where students have some free choice over the exam papers they take within a particular qualification. In this…
Descriptors: Equated Scores, Test Format, Test Items, Computation
Huiming Ding; Matt Homer – Advances in Health Sciences Education, 2025
Summative assessments are often underused for feedback, despite them being rich with data of students' applied knowledge and clinical and professional skills. To better inform teaching and student support, this study aims to gain insights from summative assessments through profiling students' performance patterns and identify those students…
Descriptors: Summative Evaluation, Profiles, Statistical Analysis, Outcomes of Education
Christian Berggren; Bengt Gerdin; Solmaz Filiz Karabag – Journal of Academic Ethics, 2025
The exposure of scientific scandals and the increase of dubious research practices have generated a stream of studies on Questionable Research Practices (QRPs), such as failure to acknowledge co-authors, selective presentation of findings, or removal of data not supporting desired outcomes. In contrast to high-profile fraud cases, QRPs can be…
Descriptors: Test Construction, Test Bias, Test Format, Response Style (Tests)
Jiajing Huang – ProQuest LLC, 2022
The nonequivalent-groups anchor-test (NEAT) data-collection design is commonly used in large-scale assessments. Under this design, different test groups take different test forms. Each test form has its own unique items and all test forms share a set of common items. If item response theory (IRT) models are applied to analyze the test data, the…
Descriptors: Item Response Theory, Test Format, Test Items, Test Construction
Inga Laukaityte; Marie Wiberg – Practical Assessment, Research & Evaluation, 2024
The overall aim was to examine effects of differences in group ability and features of the anchor test form on equating bias and the standard error of equating (SEE) using both real and simulated data. Chained kernel equating, Postratification kernel equating, and Circle-arc equating were studied. A college admissions test with four different…
Descriptors: Ability Grouping, Test Items, College Entrance Examinations, High Stakes Tests
Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020
The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…
Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021
In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…
Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis
Tingir, Seyfullah – ProQuest LLC, 2019
Educators use various statistical techniques to explain relationships between latent and observable variables. One way to model these relationships is to use Bayesian networks as a scoring model. However, adjusting the conditional probability tables (CPT-parameters) to fit a set of observations is still a challenge when using Bayesian networks. A…
Descriptors: Bayesian Statistics, Statistical Analysis, Scoring, Probability
Harbaugh, Allen G.; Liu, Min – AERA Online Paper Repository, 2017
This research examines the effects of single-value response style contamination on measures of model fit and model convergence issues. A simulation study examines the effects resulting from percentage of contamination, number of manifest, number of reverse coded items, magnitude of standardized factor loadings, response scale granularity, and…
Descriptors: Goodness of Fit, Sample Size, Statistical Analysis, Test Format
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis
Neuert, Cornelia E. – Field Methods, 2017
Previous research has shown that check-all-that-apply (CATA) and forced-choice (FC) question formats do not produce comparable results. The cognitive processes underlying respondents' answers to both types of formats still require clarification. This study contributes to filling this gap by using eye-tracking data. Both formats are compared by…
Descriptors: Measurement Techniques, Test Format, Eye Movements, Cognitive Processes
Kieftenbeld, Vincent; Boyer, Michelle – Applied Measurement in Education, 2017
Automated scoring systems are typically evaluated by comparing the performance of a single automated rater item-by-item to human raters. This presents a challenge when the performance of multiple raters needs to be compared across multiple items. Rankings could depend on specifics of the ranking procedure; observed differences could be due to…
Descriptors: Automation, Scoring, Comparative Analysis, Test Items
Menold, Natalja; Raykov, Tenko – Educational and Psychological Measurement, 2016
This article examines the possible dependency of composite reliability on presentation format of the elements of a multi-item measuring instrument. Using empirical data and a recent method for interval estimation of group differences in reliability, we demonstrate that the reliability of an instrument need not be the same when polarity of the…
Descriptors: Test Reliability, Test Format, Test Items, Differences
Sinharay, Sandip – Grantee Submission, 2018
Tatsuoka (1984) suggested several extended caution indices and their standardized versions that have been used as person-fit statistics by researchers such as Drasgow, Levine, and McLaughlin (1987), Glas and Meijer (2003), and Molenaar and Hoijtink (1990). However, these indices are only defined for tests with dichotomous items. This paper extends…
Descriptors: Test Format, Goodness of Fit, Item Response Theory, Error Patterns
Wang, Lu; Steedle, Jeffrey – ACT, Inc., 2020
In recent ACT mode comparability studies, students testing on laptop or desktop computers earned slightly higher scores on average than students who tested on paper, especially on the ACT® reading and English tests (Li et al., 2017). Equating procedures adjust for such "mode effects" to make ACT scores comparable regardless of testing…
Descriptors: Test Format, Reading Tests, Language Tests, English

Peer reviewed
Direct link
