ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	4
Since 2016 (last 10 years)	26
Since 2006 (last 20 years)	36

Descriptor

Item Analysis	132
Test Items	62
Test Construction	32
Item Response Theory	28
Test Validity	26
Simulation	22
Comparative Analysis	21
Test Reliability	21
Test Bias	20
Difficulty Level	17
Latent Trait Theory	17
College Entrance Examinations	13
Higher Education	13
Models	13
Scores	13
Achievement Tests	12
Standardized Tests	12
Computer Assisted Testing	11
Correlation	11
Error of Measurement	11
Mathematics Tests	11
Multiple Choice Tests	10
Criterion Referenced Tests	9
Evaluation Methods	9
Response Style (Tests)	9
More ▼

Source

Journal of Educational…

132

Publication Type

Journal Articles	98
Reports - Research	77
Reports - Evaluative	16
Reports - Descriptive	4
Guides - Non-Classroom	1
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Secondary Education	3
Elementary Education	1
Elementary Secondary Education	1
High Schools	1
Higher Education	1
Postsecondary Education	1

Audience

Researchers

Location

Israel	2
Netherlands	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	7
National Assessment of…	5
Graduate Record Examinations	3
Iowa Tests of Basic Skills	2
Metropolitan Achievement Tests	2
Program for International…	2
Stanford Achievement Tests	2
ACT Assessment	1
California Achievement Tests	1
College Board Achievement…	1
College and University…	1
Law School Admission Test	1
Metropolitan Readiness Tests	1
National Teacher Examinations	1
Preliminary Scholastic…	1
Raven Progressive Matrices	1
Teaching and Learning…	1
More ▼

What Works Clearinghouse Rating

Showing 1 to 15 of 132 results Save | Export

A Statistical Test for the Detection of Item Compromise Combining Responses and Response Times

Peer reviewed

Direct link

van der Linden, Wim J.; Belov, Dmitry I. – Journal of Educational Measurement, 2023

A test of item compromise is presented which combines the test takers' responses and response times (RTs) into a statistic defined as the number of correct responses on the item for test takers with RTs flagged as suspicious. The test has null and alternative distributions belonging to the well-known family of compound binomial distributions, is…

Descriptors: Item Response Theory, Reaction Time, Test Items, Item Analysis

A Comparison of Aggregation Rules for Selecting Anchor Items in Multigroup DIF Analysis

Peer reviewed

Direct link

Huelmann, Thorben; Debelak, Rudolf; Strobl, Carolin – Journal of Educational Measurement, 2020

This study addresses the topic of how anchoring methods for differential item functioning (DIF) analysis can be used in multigroup scenarios. The direct approach would be to combine anchoring methods developed for two-group scenarios with multigroup DIF-detection methods. Alternatively, multiple tests could be carried out. The results of these…

Descriptors: Test Items, Test Bias, Equated Scores, Item Analysis

Improving Item-Exposure Control in Adaptive Testing

Peer reviewed

Direct link

van der Linden, Wim J.; Choi, Seung W. – Journal of Educational Measurement, 2020

One of the methods of controlling test security in adaptive testing is imposing random item-ineligibility constraints on the selection of the items with probabilities automatically updated to maintain a predetermined upper bound on the exposure rates. Three major improvements of the method are presented. First, a few modifications to improve the…

Descriptors: Adaptive Testing, Item Response Theory, Feedback (Response), Item Analysis

Classical Item Analysis from a Signal Detection Perspective

Peer reviewed

Direct link

DeCarlo, Lawrence T. – Journal of Educational Measurement, 2023

A conceptualization of multiple-choice exams in terms of signal detection theory (SDT) leads to simple measures of item difficulty and item discrimination that are closely related to, but also distinct from, those used in classical item analysis (CIA). The theory defines a "true split," depending on whether or not examinees know an item,…

Descriptors: Multiple Choice Tests, Test Items, Item Analysis, Test Wiseness

Two IRT Characteristic Curve Linking Methods Weighted by Information

Peer reviewed

Direct link

Wang, Shaojie; Zhang, Minqiang; Lee, Won-Chan; Huang, Feifei; Li, Zonglong; Li, Yixing; Yu, Sufang – Journal of Educational Measurement, 2022

Traditional IRT characteristic curve linking methods ignore parameter estimation errors, which may undermine the accuracy of estimated linking constants. Two new linking methods are proposed that take into account parameter estimation errors. The item- (IWCC) and test-information-weighted characteristic curve (TWCC) methods employ weighting…

Descriptors: Item Response Theory, Error of Measurement, Accuracy, Monte Carlo Methods

A Method for Detecting Regression of Hard and Easy Item Angoff Ratings

Peer reviewed

Direct link

Wyse, Adam E.; Babcock, Ben – Journal of Educational Measurement, 2019

One common phenomenon in Angoff standard setting is that panelists regress their ratings in toward the middle of the probability scale. This study describes two indices based on taking ratios of standard deviations that can be utilized with a scatterplot of item ratings versus expected probabilities of success to identify whether ratings are…

Descriptors: Item Analysis, Standard Setting, Probability, Feedback (Response)

The Impact of Multidimensionality on Extraction of Latent Classes in Mixture Rasch Models

Peer reviewed

Direct link

Jang, Yoonsun; Kim, Seock-Ho; Cohen, Allan S. – Journal of Educational Measurement, 2018

This study investigates the effect of multidimensionality on extraction of latent classes in mixture Rasch models. In this study, two-dimensional data were generated under varying conditions. The two-dimensional data sets were analyzed with one- to five-class mixture Rasch models. Results of the simulation study indicate the mixture Rasch model…

Descriptors: Item Response Theory, Simulation, Correlation, Multidimensional Scaling

Efficiency of Targeted Multistage Calibration Designs under Practical Constraints: A Simulation Study

Peer reviewed

Direct link

Berger, Stéphanie; Verschoor, Angela J.; Eggen, Theo J. H. M.; Moser, Urs – Journal of Educational Measurement, 2019

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that…

Descriptors: Simulation, Computer Assisted Testing, Test Items, Difficulty Level

A Comparison of Procedures for Estimating Person Reliability Parameters in the Graded Response Model

Peer reviewed

Direct link

LaHuis, David M.; Bryant-Lees, Kinsey B.; Hakoyama, Shotaro; Barnes, Tyler; Wiemann, Andrea – Journal of Educational Measurement, 2018

Person reliability parameters (PRPs) model temporary changes in individuals' attribute level perceptions when responding to self-report items (higher levels of PRPs represent less fluctuation). PRPs could be useful in measuring careless responding and traitedness. However, it is unclear how well current procedures for estimating PRPs can recover…

Descriptors: Comparative Analysis, Reliability, Error of Measurement, Measurement Techniques

Standard Errors of IRT Parameter Scale Transformation Coefficients: Comparison of Bootstrap Method, Delta Method, and Multiple Imputation Method

Peer reviewed

Direct link

Zhang, Zhonghua; Zhao, Mingren – Journal of Educational Measurement, 2019

The present study evaluated the multiple imputation method, a procedure that is similar to the one suggested by Li and Lissitz (2004), and compared the performance of this method with that of the bootstrap method and the delta method in obtaining the standard errors for the estimates of the parameter scale transformation coefficients in item…

Descriptors: Item Response Theory, Error Patterns, Item Analysis, Simulation

Gender Bias in Test Item Formats: Evidence from PISA 2009, 2012, and 2015 Math and Reading Tests

Peer reviewed

Direct link

Shear, Benjamin R. – Journal of Educational Measurement, 2023

Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…

Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests

Examining the Precision of Cut Scores within a Generalizability Theory Framework: A Closer Look at the Item Effect

Peer reviewed

Direct link

Clauser, Brian E.; Kane, Michael; Clauser, Jerome C. – Journal of Educational Measurement, 2020

An Angoff standard setting study generally yields judgments on a number of items by a number of judges (who may or may not be nested in panels). Variability associated with judges (and possibly panels) contributes error to the resulting cut score. The variability associated with items plays a more complicated role. To the extent that the mean item…

Descriptors: Cutting Scores, Generalization, Decision Making, Standard Setting

An Item-Level Expected Classification Accuracy and Its Applications in Cognitive Diagnostic Assessment

Peer reviewed

Direct link

Wang, Wenyi; Song, Lihong; Chen, Ping; Ding, Shuliang – Journal of Educational Measurement, 2019

Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input…

Descriptors: Cognitive Tests, Classification, Accuracy, Diagnostic Tests

Routing Strategies and Optimizing Design for Multistage Testing in International Large-Scale Assessments

Peer reviewed

Direct link

Svetina, Dubravka; Liaw, Yuan-Ling; Rutkowski, Leslie; Rutkowski, David – Journal of Educational Measurement, 2019

This study investigates the effect of several design and administration choices on item exposure and person/item parameter recovery under a multistage test (MST) design. In a simulation study, we examine whether number-correct (NC) or item response theory (IRT) methods are differentially effective at routing students to the correct next stage(s)…

Descriptors: Measurement, Item Analysis, Test Construction, Item Response Theory

Scale Alignment in Between-Item Multidimensional Rasch Models

Peer reviewed

Direct link

Feuerstahler, Leah; Wilson, Mark – Journal of Educational Measurement, 2019

Scores estimated from multidimensional item response theory (IRT) models are not necessarily comparable across dimensions. In this article, the concept of aligned dimensions is formalized in the context of Rasch models, and two methods are described--delta dimensional alignment (DDA) and logistic regression alignment (LRA)--to transform estimated…

Descriptors: Item Response Theory, Models, Scores, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

van der Linden, Wim J.	4
Dorans, Neil J.	3
Albano, Anthony D.	2
Bennett, Randy Elliot	2
Hoover, H. D.	2
Huck, Schuyler W.	2
Liaw, Yuan-Ling	2
Mehrens, William A.	2
Miller, M. David	2
Phillips, S. E.	2
Plake, Barbara S.	2
Roussos, Louis A.	2
Rudner, Lawrence M.	2
Rutkowski, David	2
Rutkowski, Leslie	2
Wainer, Howard	2
Whitely, Susan E.	2
Wilson, Mark	2
Zwick, Rebecca	2
Ahn, Meeyeon	1
Albanese, Mark A.	1
Ambrosino, Robert J.	1
Ames, Allison	1
Anderson, Ronald E.	1
More ▼