ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	0
Since 2016 (last 10 years)	0
Since 2006 (last 20 years)	5

Descriptor

Data	6
Scores	3
Bayesian Statistics	2
Comparative Analysis	2
Correlation	2
Goodness of Fit	2
Item Response Theory	2
Methods	2
Models	2
Psychometrics	2
Simulation	2
Test Bias	2
Test Items	2
Tests	2
College Entrance Examinations	1
Computation	1
Difficulty Level	1
Equated Scores	1
Gender Differences	1
Licensing Examinations…	1
Mathematics Tests	1
Multiple Choice Tests	1
Networks	1
Psychological Testing	1
Racial Differences	1
More ▼

Source

Educational Measurement:…	2
ETS Research Report Series	1
Educational Testing Service	1
Journal of Educational…	1
Journal of Educational and…	1

Author

Sinharay, Sandip	6
Haberman, Shelby J.	2
Blew, Edwin O.	1
Dorans, Neil J.	1
Grant, Mary C.	1
Holland, Paul W.	1
Puhan, Gautam	1

Publication Type

Journal Articles	5
Reports - Research	3
Reports - Evaluative	2
Reports - Descriptive	1

Education Level

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Pre Professional Skills Tests

What Works Clearinghouse Rating

Showing all 6 results Save | Export

How Often Is the Misfit of Item Response Theory Models Practically Significant?

Peer reviewed

Direct link

Sinharay, Sandip; Haberman, Shelby J. – Educational Measurement: Issues and Practice, 2014

Standard 3.9 of the Standards for Educational and Psychological Testing ([, 1999]) demands evidence of model fit when item response theory (IRT) models are employed to data from tests. Hambleton and Han ([Hambleton, R. K., 2005]) and Sinharay ([Sinharay, S., 2005]) recommended the assessment of practical significance of misfit of IRT models, but…

Descriptors: Item Response Theory, Goodness of Fit, Models, Tests

An NCME Instructional Module on Subscores

Peer reviewed

Direct link

Sinharay, Sandip; Puhan, Gautam; Haberman, Shelby J. – Educational Measurement: Issues and Practice, 2011

The purpose of this ITEMS module is to provide an introduction to subscores. First, examples of subscores from an operational test are provided. Then, a review of methods that can be used to examine if subscores have adequate psychometric quality is provided. It is demonstrated, using results from operational and simulated data, that subscores…

Descriptors: Scores, Psychometrics, Tests, Data

How Often Do Subscores Have Added Value? Results from Operational and Simulated Data

Peer reviewed

Direct link

Sinharay, Sandip – Journal of Educational Measurement, 2010

Recently, there has been an increasing level of interest in subscores for their potential diagnostic value. Haberman suggested a method based on classical test theory to determine whether subscores have added value over total scores. In this article I first provide a rich collection of results regarding when subscores were found to have added…

Descriptors: Scores, Test Theory, Simulation, Reliability

Using Past Data to Enhance Small Sample DIF Estimation: A Bayesian Approach

Peer reviewed

Direct link

Sinharay, Sandip; Dorans, Neil J.; Grant, Mary C.; Blew, Edwin O. – Journal of Educational and Behavioral Statistics, 2009

Test administrators often face the challenge of detecting differential item functioning (DIF) with samples of size smaller than that recommended by experts. A Bayesian approach can incorporate, in the form of a prior distribution, existing information on the inference problem at hand, which yields more stable estimation, especially for small…

Descriptors: Test Bias, Computation, Bayesian Statistics, Data

The Missing Data Assumptions of the Nonequivalent Groups with Anchor Test (NEAT) Design and Their Implications for Test Equating. Research Report. ETS RR-09-16

Download full text

Sinharay, Sandip; Holland, Paul W. – Educational Testing Service, 2008

The nonequivalent groups with anchor test (NEAT) design involves missing data that are missing by design. Three popular equating methods that can be used with a NEAT design are the poststratification equating method, the chain equipercentile equating method, and the item-response-theory observed-score-equating method. These three methods each…

Descriptors: Equated Scores, Test Items, Item Response Theory, Data

Model Diagnostics for Bayesian Networks. Research Report. ETS RR-04-17

Peer reviewed
PDF on ERIC

Download full text

Sinharay, Sandip – ETS Research Report Series, 2004

Assessing fit of psychometric models has always been an issue of enormous interest, but there exists no unanimously agreed upon item fit diagnostic for the models. Bayesian networks, frequently used in educational assessments (see, for example, Mislevy, Almond, Yan, & Steinberg, 2001) primarily for learning about students' knowledge and…

Descriptors: Bayesian Statistics, Networks, Models, Goodness of Fit