ERIC - Search Results

Publication Date

In 2025	0
Since 2024	1
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	5
Since 2006 (last 20 years)	7

Source

Journal of Educational…

Publication Type

Journal Articles	11
Reports - Research	6
Reports - Evaluative	3
Reports - Descriptive	2

Education Level

Secondary Education	2
Grade 8	1

Audience

Location

Belgium	1
Turkey	1

Laws, Policies, & Programs

Assessments and Surveys

Program for International…	2
Graduate Record Examinations	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

Information Functions of Rank-2PL Models for Forced-Choice Questionnaires

Peer reviewed

Direct link

Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Journal of Educational Measurement, 2024

This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's…

Descriptors: Questionnaires, Test Items, Item Response Theory, Models

Constructing a Robust Score Scale from IRT Scores with Informed Boundaries

Peer reviewed

Direct link

Choe, Edison M.; Han, Kyung T. – Journal of Educational Measurement, 2022

In operational testing, item response theory (IRT) models for dichotomous responses are popular for measuring a single latent construct [theta], such as cognitive ability in a content domain. Estimates of [theta], also called IRT scores or [theta hat], can be computed using estimators based on the likelihood function, such as maximum likelihood…

Descriptors: Scores, Item Response Theory, Test Items, Test Format

Score Comparability between Online Proctored and In-Person Credentialing Exams

Peer reviewed

Direct link

Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022

This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…

Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing

Historical Perspectives on Score Comparability Issues Raised by Innovations in Testing

Peer reviewed

Direct link

Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022

While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…

Descriptors: Scoring, Testing, Test Items, Test Format

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Modeling Item-Position Effects within an IRT Framework

Peer reviewed

Direct link

Debeer, Dries; Janssen, Rianne – Journal of Educational Measurement, 2013

Changing the order of items between alternate test forms to prevent copying and to enhance test security is a common practice in achievement testing. However, these changes in item order may affect item and test characteristics. Several procedures have been proposed for studying these item-order effects. The present study explores the use of…

Descriptors: Item Response Theory, Test Items, Test Format, Models

Detecting Answer Copying Using Alternate Test Forms and Seat Locations in Small-Scale Examinations

Peer reviewed

Direct link

van der Ark, L. Andries; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational Measurement, 2008

Two types of answer-copying statistics for detecting copiers in small-scale examinations are proposed. One statistic identifies the "copier-source" pair, and the other in addition suggests who is copier and who is source. Both types of statistics can be used when the examination has alternate test forms. A simulation study shows that the…

Descriptors: Cheating, Statistics, Test Format, Measures (Individuals)

Estimating Average Domain Scores.

Peer reviewed

Pommerich, Mary; Nicewander, W. Alan; Hanson, Bradley A. – Journal of Educational Measurement, 1999

Studied whether a group's average percent correct in a content domain could be accurately estimated for groups taking a single test form and not the entire domain of items. Evaluated six Item Response Theory-based domain score estimation methods through simulation and concluded they performed better than observed score on the form taken. (SLD)

Descriptors: Estimation (Mathematics), Groups, Item Response Theory, Scores

How Well Can We Compare Scores on Test Forms That Are Constructed by Examinees' Choice?

Peer reviewed

Wainer, Howard; And Others – Journal of Educational Measurement, 1994

The comparability of scores on test forms that are constructed through examinee item choice is examined in an item response theory framework. The approach is illustrated with data from the College Board's Advanced Placement Test in Chemistry taken by over 18,000 examinees. (SLD)

Descriptors: Advanced Placement, Chemistry, Comparative Analysis, Constructed Response

A Comparison of Quantitative Questions in Open-Ended and Multiple-Choice Formats.

Peer reviewed

Bridgeman, Brent – Journal of Educational Measurement, 1992

Examinees in a regular administration of the quantitative portion of the Graduate Record Examination responded to particular items in a machine-scannable multiple-choice format. Volunteers (n=364) used a computer to answer open-ended counterparts of these items. Scores for both formats demonstrated similar correlational patterns. (SLD)

Descriptors: Answer Sheets, College Entrance Examinations, College Students, Comparative Testing

A Comparison of Self-Adapted and Computerized Adaptive Tests.

Peer reviewed

Wise, Steven L.; And Others – Journal of Educational Measurement, 1992

Performance of 156 undergraduate and 48 graduate students on a self-adapted test (SFAT)--students choose the difficulty level of their test items--was compared with performance on a computer-adapted test (CAT). Those taking the SFAT obtained higher ability scores and reported lower posttest state anxiety than did CAT takers. (SLD)

Descriptors: Adaptive Testing, Comparative Testing, Computer Assisted Testing, Difficulty Level

Scores	11
Test Format	11
Test Items	7
Item Response Theory	6
Comparative Analysis	5
Computer Assisted Testing	3
Difficulty Level	3
Mathematics Tests	3
Multiple Choice Tests	3
Achievement Tests	2
Comparative Testing	2
Estimation (Mathematics)	2
Foreign Countries	2
Higher Education	2
Models	2
Secondary School Students	2
Simulation	2
Test Construction	2
Testing	2
Adaptive Testing	1
Advanced Placement	1
Answer Sheets	1
Cheating	1
Chemistry	1
College Entrance Examinations	1
More ▼

Baldwin, Peter	1
Borglum, Joshua	1
Bridgeman, Brent	1
Choe, Edison M.	1
Clauser, Brian E.	1
Debeer, Dries	1
Emons, Wilco H. M.	1
Han, Kyung T.	1
Hanson, Bradley A.	1
Janssen, Rianne	1
Jianbin Fu	1
Jones, Paul	1
Liu, Jinghua	1
Nicewander, W. Alan	1
Patrick C. Kyllonen	1
Pommerich, Mary	1
Primoli, Vince	1
Shear, Benjamin R.	1
Sijtsma, Klaas	1
Tong, Ye	1
Wainer, Howard	1
Wise, Steven L.	1
Xuan Tan	1
van der Ark, L. Andries	1
More ▼