ERIC - Search Results

Publication Date

In 2025	1
Since 2024	2
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	16
Since 2006 (last 20 years)	26

Descriptor

Test Format	70
Test Items	33
Test Construction	22
Comparative Analysis	19
Multiple Choice Tests	19
Item Response Theory	18
Computer Assisted Testing	11
Scores	11
Higher Education	10
College Entrance Examinations	9
Test Bias	9
Test Reliability	9
Test Validity	9
Comparative Testing	8
High School Students	8
Item Analysis	8
Mathematics Tests	8
Scoring	8
Foreign Countries	7
High Schools	7
Models	7
Simulation	7
Achievement Tests	6
Difficulty Level	6
Error of Measurement	6
More ▼

Source

Journal of Educational…

Publication Type

Journal Articles	70
Reports - Research	48
Reports - Evaluative	14
Reports - Descriptive	8
Guides - Non-Classroom	2
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	3
Elementary Secondary Education	1
Grade 8	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

Israel	2
Belgium	1
Ireland	1
Netherlands	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Graduate Record Examinations	3
Program for International…	3
Advanced Placement…	2
SAT (College Admission Test)	2
Mathematics Anxiety Rating…	1
National Assessment of…	1
North Carolina End of Course…	1
Peabody Picture Vocabulary…	1
State Trait Anxiety Inventory	1

What Works Clearinghouse Rating

Showing 1 to 15 of 70 results Save | Export

Information Functions of Rank-2PL Models for Forced-Choice Questionnaires

Peer reviewed

Direct link

Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Journal of Educational Measurement, 2024

This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's…

Descriptors: Questionnaires, Test Items, Item Response Theory, Models

Constructing a Robust Score Scale from IRT Scores with Informed Boundaries

Peer reviewed

Direct link

Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022

In operational testing, item response theory (IRT) models for dichotomous responses are popular for measuring a single latent construct [theta], such as cognitive ability in a content domain. Estimates of [theta], also called IRT scores or [theta hat], can be computed using estimators based on the likelihood function, such as maximum likelihood…

Descriptors: Scores, Item Response Theory, Test Items, Test Format

Incorporating Test-Taking Engagement into Multistage Adaptive Testing Design for Large-Scale Assessments

Peer reviewed

Direct link

Okan Bulut; Guher Gorgun; Hacer Karamese – Journal of Educational Measurement, 2025

The use of multistage adaptive testing (MST) has gradually increased in large-scale testing programs as MST achieves a balanced compromise between linear test design and item-level adaptive testing. MST works on the premise that each examinee gives their best effort when attempting the items, and their responses truly reflect what they know or can…

Descriptors: Response Style (Tests), Testing Problems, Testing Accommodations, Measurement

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

Examining the Impacts of Ignoring Rater Effects in Mixed-Format Tests

Peer reviewed

Direct link

Guo, Wenjing; Wind, Stefanie A. – Journal of Educational Measurement, 2021

The use of mixed-format tests made up of multiple-choice (MC) items and constructed response (CR) items is popular in large-scale testing programs, including the National Assessment of Educational Progress (NAEP) and many district- and state-level assessments in the United States. Rater effects, or raters' scoring tendencies that result in…

Descriptors: Test Format, Multiple Choice Tests, Scoring, Test Items

A Comparison of Constraint Programming and Mixed-Integer Programming for Automated Test-Form Generation

Peer reviewed

Direct link

Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018

The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…

Descriptors: Programming, Automation, Test Items, Test Format

IRT Approaches to Modeling Scores on Mixed-Format Tests

Peer reviewed

Direct link

Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020

This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…

Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

On-the-Fly Constraint-Controlled Assembly Methods for Multistage Adaptive Testing for Cognitive Diagnosis

Peer reviewed

Direct link

Liu, Shuchang; Cai, Yan; Tu, Dongbo – Journal of Educational Measurement, 2018

This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index…

Descriptors: Adaptive Testing, Monte Carlo Methods, Computer Security, Clinical Diagnosis

Evaluating Statistical Targets for Assembling Parallel Mixed-Format Test Forms

Peer reviewed

Direct link

Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017

Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…

Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis

A Comparison of Strategies for Smoothing Parameter Selection for Mixed-Format Tests under the Random Groups Design

Peer reviewed

Direct link

Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2018

Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed-format pseudo tests under the…

Descriptors: Comparative Analysis, Accuracy, Models, Sample Size

Does Maximizing Information at the Cut Score Always Maximize Classification Accuracy and Consistency?

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2016

A common suggestion made in the psychometric literature for fixed-length classification tests is that one should design tests so that they have maximum information at the cut score. Designing tests in this way is believed to maximize the classification accuracy and consistency of the assessment. This article uses simulated examples to illustrate…

Descriptors: Cutting Scores, Psychometrics, Test Construction, Classification

Parameter Estimation in Rasch Models for Examinee-Selected Items

Peer reviewed

Direct link

Liu, Chen-Wei; Wang, Wen-Chung – Journal of Educational Measurement, 2017

The examinee-selected-item (ESI) design, in which examinees are required to respond to a fixed number of items in a given set of items (e.g., choose one item to respond from a pair of items), always yields incomplete data (i.e., only the selected items are answered and the others have missing data) that are likely nonignorable. Therefore, using…

Descriptors: Item Response Theory, Models, Maximum Likelihood Statistics, Data Analysis

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Peer reviewed

Direct link

Wang, Shiyu; Lin, Haiyan; Chang, Hua-Hua; Douglas, Jeff – Journal of Educational Measurement, 2016

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large-scale computer-based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large-scale implementations, there is no simple answer to the question of which design is better because different…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Format, Sequential Approach

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5

van der Linden, Wim J.	4
Bennett, Randy Elliot	3
Debeer, Dries	2
Hambleton, Ronald K.	2
Kim, Sooyeon	2
Kolen, Michael J.	2
Lee, Won-Chan	2
Pommerich, Mary	2
Sykes, Robert C.	2
Wilcox, Rand R.	2
Adema, Jos J.	1
Albano, Anthony D.	1
Algina, James	1
Ali, Usama S.	1
Allalouf, Avi	1
Askegaard, Lewis D.	1
Babcock, Ben	1
Baldwin, Peter	1
Benson, Jeri	1
Berger, Aliza E.	1
Bergstrom, Betty A.	1
Berk, Ronald A.	1
Bolger, Niall	1
Borglum, Joshua	1
More ▼