ERIC - Search Results

Publication Date

In 2026	0
Since 2025	3
Since 2022 (last 5 years)	12
Since 2017 (last 10 years)	27
Since 2007 (last 20 years)	50

Descriptor

Error of Measurement	78
Test Length	78
Test Items	41
Item Response Theory	36
Sample Size	30
Test Reliability	20
Models	18
Comparative Analysis	17
Simulation	17
Scores	16
Monte Carlo Methods	15
Computation	14
Computer Assisted Testing	14
Statistical Analysis	13
Adaptive Testing	12
Test Bias	11
Estimation (Mathematics)	10
Item Analysis	10
Statistical Bias	10
Goodness of Fit	8
Ability	7
Accuracy	7
Foreign Countries	7
Probability	7
Sampling	7
More ▼

Publication Type

Journal Articles	58
Reports - Research	53
Reports - Evaluative	16
Dissertations/Theses -…	4
Speeches/Meeting Papers	4
Reports - Descriptive	2

Education Level

Grade 3	2
Higher Education	2
Postsecondary Education	2
Secondary Education	2
Early Childhood Education	1
Elementary Education	1
Elementary Secondary Education	1
High Schools	1
Primary Education	1

Audience

Researchers

Location

Taiwan	2
Turkey	2
Iran	1
Japan	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
Advanced Placement…	1
Armed Forces Qualification…	1
California Psychological…	1
Comprehensive Tests of Basic…	1
National Assessment of…	1
National Longitudinal Study…	1
Program for International…	1
Test of English as a Foreign…	1
Trends in International…	1
Wechsler Adult Intelligence…	1
More ▼

What Works Clearinghouse Rating

Error of Measurement X

Showing 31 to 45 of 78 results Save | Export

Comparing the Performance of Five Multidimensional CAT Selection Procedures with Different Stopping Rules

Peer reviewed

Direct link

Yao, Lihua – Applied Psychological Measurement, 2013

Through simulated data, five multidimensional computerized adaptive testing (MCAT) selection procedures with varying test lengths are examined and compared using different stopping rules. Fixed item exposure rates are used for all the items, and the Priority Index (PI) method is used for the content constraints. Two stopping rules, standard error…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Mixed-Format Test Score Equating: Effect of Item-Type Multidimensionality, Length and Composition of Common-Item Set, and Group Ability Difference

Direct link

Wang, Wei – ProQuest LLC, 2013

Mixed-format tests containing both multiple-choice (MC) items and constructed-response (CR) items are now widely used in many testing programs. Mixed-format tests often are considered to be superior to tests containing only MC items although the use of multiple item formats leads to measurement challenges in the context of equating conducted under…

Descriptors: Equated Scores, Test Format, Test Items, Test Length

Test Length and Decision Quality in Personnel Selection: When Is Short Too Short?

Peer reviewed

Direct link

Kruyen, Peter M.; Emons, Wilco H. M.; Sijtsma, Klaas – International Journal of Testing, 2012

Personnel selection shows an enduring need for short stand-alone tests consisting of, say, 5 to 15 items. Despite their efficiency, short tests are more vulnerable to measurement error than longer test versions. Consequently, the question arises to what extent reducing test length deteriorates decision quality due to increased impact of…

Descriptors: Measurement, Personnel Selection, Decision Making, Error of Measurement

An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

Peer reviewed

Direct link

Han, Kyung T. – Journal of Educational Measurement, 2012

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Selection

Evaluating EIV, OLS, and SEM Estimators of Group Slope Differences in the Presence of Measurement Error: The Single-Indicator Case

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2012

Measurement error significantly biases interaction effects and distorts researchers' inferences regarding interactive hypotheses. This article focuses on the single-indicator case and shows how to accurately estimate group slope differences by disattenuating interaction effects with errors-in-variables (EIV) regression. New analytic findings were…

Descriptors: Evidence, Test Length, Interaction, Regression (Statistics)

Marginal Maximum A Posteriori Item Parameter Estimation for the Generalized Graded Unfolding Model

Peer reviewed

Direct link

Roberts, James S.; Thompson, Vanessa M. – Applied Psychological Measurement, 2011

A marginal maximum a posteriori (MMAP) procedure was implemented to estimate item parameters in the generalized graded unfolding model (GGUM). Estimates from the MMAP method were compared with those derived from marginal maximum likelihood (MML) and Markov chain Monte Carlo (MCMC) procedures in a recovery simulation that varied sample size,…

Descriptors: Statistical Analysis, Markov Processes, Computation, Monte Carlo Methods

Formulation of a DIMTEST Effect Size Measure (DESM) and Evaluation of the DESM Estimator Bias

Peer reviewed

Direct link

Seo, Minhee; Roussos, Louis A. – Journal of Educational Measurement, 2010

DIMTEST is a widely used and studied method for testing the hypothesis of test unidimensionality as represented by local item independence. However, DIMTEST does not report the amount of multidimensionality that exists in data when rejecting its null. To provide more information regarding the degree to which data depart from unidimensionality, a…

Descriptors: Effect Size, Statistical Bias, Computation, Test Length

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores: Theory and Applications

Peer reviewed

Direct link

Yao, Lihua – Psychometrika, 2012

Multidimensional computer adaptive testing (MCAT) can provide higher precision and reliability or reduce test length when compared with unidimensional CAT or with the paper-and-pencil test. This study compared five item selection procedures in the MCAT framework for both domain scores and overall scores through simulation by varying the structure…

Descriptors: Item Banks, Test Length, Simulation, Adaptive Testing

A Comparison of Item Fit Statistics for Mixed IRT Models

Peer reviewed

Direct link

Chon, Kyong Hee; Lee, Won-Chan; Dunbar, Stephen B. – Journal of Educational Measurement, 2010

In this study we examined procedures for assessing model-data fit of item response theory (IRT) models for mixed format data. The model fit indices used in this study include PARSCALE's G[superscript 2], Orlando and Thissen's S-X[superscript 2] and S-G[superscript 2], and Stone's chi[superscript 2*] and G[superscript 2*]. To investigate the…

Descriptors: Test Length, Goodness of Fit, Item Response Theory, Simulation

Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

Peer reviewed

Direct link

Sueiro, Manuel J.; Abad, Francisco J. – Educational and Psychological Measurement, 2011

The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

Descriptors: Goodness of Fit, Item Response Theory, Nonparametric Statistics, Probability

Evaluation of Methods to Compute Complex Sample Standard Errors in Latent Regression Models. Research Report. ETS RR-09-49

Peer reviewed
PDF on ERIC

Download full text

Oranje, Andreas; Li, Deping; Kandathil, Mathew – ETS Research Report Series, 2009

Several complex sample standard error estimators based on linearization and resampling for the latent regression model of the National Assessment of Educational Progress (NAEP) are studied with respect to design choices such as number of items, number of regressors, and the efficiency of the sample. This paper provides an evaluation of the extent…

Descriptors: Error of Measurement, Computation, Regression (Statistics), National Competency Tests

Correcting Fallacies in Validity, Reliability, and Classification

Peer reviewed

Direct link

Sijtsma, Klaas – International Journal of Testing, 2009

This article reviews three topics from test theory that continue to raise discussion and controversy and capture test theorists' and constructors' interest. The first topic concerns the discussion of the methodology of investigating and establishing construct validity; the second topic concerns reliability and its misuse, alternative definitions…

Descriptors: Construct Validity, Reliability, Classification, Test Theory

Variable-Length Computerized Adaptive Testing: Adaptation of the A-Stratified Strategy in Item Selection with Content Balancing

Direct link

Huo, Yan – ProQuest LLC, 2009

Variable-length computerized adaptive testing (CAT) can provide examinees with tailored test lengths. With the fixed standard error of measurement ("SEM") termination rule, variable-length CAT can achieve predetermined measurement precision by using relatively shorter tests compared to fixed-length CAT. To explore the application of…

Descriptors: Test Length, Test Items, Adaptive Testing, Item Analysis

Modification of the Mantel-Haenszel and Logistic Regression DIF Procedures to Incorporate the SIBTEST Regression Correction

Peer reviewed

Direct link

DeMars, Christine E. – Journal of Educational and Behavioral Statistics, 2009

The Mantel-Haenszel (MH) and logistic regression (LR) differential item functioning (DIF) procedures have inflated Type I error rates when there are large mean group differences, short tests, and large sample sizes.When there are large group differences in mean score, groups matched on the observed number-correct score differ on true score,…

Descriptors: Regression (Statistics), Test Bias, Error of Measurement, True Scores

A Model Fit Statistic for Generalized Partial Credit Model

Peer reviewed

Direct link

Liang, Tie; Wells, Craig S. – Educational and Psychological Measurement, 2009

Investigating the fit of a parametric model is an important part of the measurement process when implementing item response theory (IRT), but research examining it is limited. A general nonparametric approach for detecting model misfit, introduced by J. Douglas and A. S. Cohen (2001), has exhibited promising results for the two-parameter logistic…

Descriptors: Sample Size, Nonparametric Statistics, Item Response Theory, Goodness of Fit

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6

Educational and Psychological…	13
Applied Psychological…	9
ETS Research Report Series	7
Journal of Educational…	6
International Journal of…	5
Applied Measurement in…	4
ProQuest LLC	4
International Journal of…	3
Psychometrika	3
Educational Sciences: Theory…	2
Journal of Educational and…	2
ACT Education Corp.	1
Assessment & Evaluation in…	1
Education and Information…	1
Grantee Submission	1
Journal of Psychoeducational…	1
Physical Review Physics…	1
Psychological Assessment	1
Psychological Methods	1
More ▼

Sijtsma, Klaas	3
Wang, Wen-Chung	3
DeMars, Christine E.	2
Emons, Wilco H. M.	2
Finch, Holmes	2
Gu, Lixiong	2
Kilic, Abdullah Faruk	2
Lee, Won-Chan	2
Lee, Yi-Hsuan	2
Livingston, Samuel A.	2
Stark, Stephen	2
Wingersky, Marilyn S.	2
Yao, Lihua	2
Zhang, Jinming	2
A. Corinne Huggins-Manley	1
Abad, Francisco J.	1
Allison, Paul A.	1
Andersson, Björn	1
Arsan, Nihan	1
Atalay Kabasakal, Kübra	1
Atar, Burcu	1
Axelrod, Bradley N.	1
Ayse Bilicioglu Gunes	1
Ban, Jae-Chun	1
More ▼