ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	0
Since 2007 (last 20 years)	15

Descriptor

Comparative Analysis	15
Computation	15
Item Response Theory	10
Simulation	9
Measurement	4
Models	4
Test Items	4
Error of Measurement	3
Monte Carlo Methods	3
Nonparametric Statistics	3
Scores	3
Test Length	3
Tests	3
Adaptive Testing	2
Bias	2
Cognitive Tests	2
Computer Assisted Testing	2
Computer Software	2
Evaluation Methods	2
Item Banks	2
Language Tests	2
Maximum Likelihood Statistics	2
Measurement Techniques	2
Multiple Choice Tests	2
Psychology	2
More ▼

Source

Applied Psychological…

Publication Type

Journal Articles	15
Reports - Research	9
Reports - Evaluative	5
Reports - Descriptive	1

Education Level

Higher Education	2
Early Childhood Education	1
Elementary Education	1
Grade 2	1
High Schools	1
Postsecondary Education	1
Primary Education	1
Secondary Education	1

Audience

Location

Taiwan

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 15 results Save | Export

The Reliability and Precision of Total Scores and IRT Estimates as a Function of Polytomous IRT Parameters and Latent Trait Distribution

Peer reviewed

Direct link

Culpepper, Steven Andrew – Applied Psychological Measurement, 2013

A classic topic in the fields of psychometrics and measurement has been the impact of the number of scale categories on test score reliability. This study builds on previous research by further articulating the relationship between item response theory (IRT) and classical test theory (CTT). Equations are presented for comparing the reliability and…

Descriptors: Item Response Theory, Reliability, Scores, Error of Measurement

Coefficient Alpha Bootstrap Confidence Interval under Nonnormality

Peer reviewed

Direct link

Padilla, Miguel A.; Divers, Jasmin; Newton, Matthew – Applied Psychological Measurement, 2012

Three different bootstrap methods for estimating confidence intervals (CIs) for coefficient alpha were investigated. In addition, the bootstrap methods were compared with the most promising coefficient alpha CI estimation methods reported in the literature. The CI methods were assessed through a Monte Carlo simulation utilizing conditions…

Descriptors: Intervals, Monte Carlo Methods, Computation, Sampling

A Comparison of Bias Correction Adjustments for the DETECT Procedure

Peer reviewed

Direct link

Nandakumar, Ratna; Yu, Feng; Zhang, Yanwei – Applied Psychological Measurement, 2011

DETECT is a nonparametric methodology to identify the dimensional structure underlying test data. The associated DETECT index, "D[subscript max]," denotes the degree of multidimensionality in data. Conditional covariances (CCOV) are the building blocks of this index. In specifying population CCOVs, the latent test composite [theta][subscript TT]…

Descriptors: Nonparametric Statistics, Statistical Analysis, Tests, Data

Coefficient Alpha and Reliability of Scale Scores

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Psychological Measurement, 2013

The majority of large-scale assessments develop various score scales that are either linear or nonlinear transformations of raw scores for better interpretations and uses of assessment results. The current formula for coefficient alpha (a; the commonly used reliability coefficient) only provides internal consistency reliability estimates of raw…

Descriptors: Raw Scores, Scaling, Reliability, Computation

An Empirical Evaluation of the Slip Correction in the Four Parameter Logistic Models with Computerized Adaptive Testing

Peer reviewed

Direct link

Yen, Yung-Chin; Ho, Rong-Guey; Laio, Wen-Wei; Chen, Li-Ju; Kuo, Ching-Chin – Applied Psychological Measurement, 2012

In a selected response test, aberrant responses such as careless errors and lucky guesses might cause error in ability estimation because these responses do not actually reflect the knowledge that examinees possess. In a computerized adaptive test (CAT), these aberrant responses could further cause serious estimation error due to dynamic item…

Descriptors: Computer Assisted Testing, Adaptive Testing, Test Items, Response Style (Tests)

Recovery of Graded Response Model Parameters: A Comparison of Marginal Maximum Likelihood and Markov Chain Monte Carlo Estimation

Peer reviewed

Direct link

Kieftenbeld, Vincent; Natesan, Prathiba – Applied Psychological Measurement, 2012

Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…

Descriptors: Test Length, Markov Processes, Item Response Theory, Monte Carlo Methods

A Binary Programming Approach to Automated Test Assembly for Cognitive Diagnosis Models

Peer reviewed

Direct link

Finkelman, Matthew D.; Kim, Wonsuk; Roussos, Louis; Verschoor, Angela – Applied Psychological Measurement, 2010

Automated test assembly (ATA) has been an area of prolific psychometric research. Although ATA methodology is well developed for unidimensional models, its application alongside cognitive diagnosis models (CDMs) is a burgeoning topic. Two suggested procedures for combining ATA and CDMs are to maximize the cognitive diagnostic index and to use a…

Descriptors: Automation, Test Construction, Programming, Models

An Extension of Least Squares Estimation of IRT Linking Coefficients for the Graded Response Model

Peer reviewed

Direct link

Kim, Seonghoon – Applied Psychological Measurement, 2010

The three types (generalized, unweighted, and weighted) of least squares methods, proposed by Ogasawara, for estimating item response theory (IRT) linking coefficients under dichotomous models are extended to the graded response model. A simulation study was conducted to confirm the accuracy of the extended formulas, and a real data study was…

Descriptors: Least Squares Statistics, Computation, Item Response Theory, Models

A Modified Frequency Estimation Equating Method for the Common-Item Nonequivalent Groups Design

Peer reviewed

Direct link

Wang, Tianyou; Brennan, Robert L. – Applied Psychological Measurement, 2009

Frequency estimation, also called poststratification, is an equating method used under the common-item nonequivalent groups design. A modified frequency estimation method is proposed here, based on altering one of the traditional assumptions in frequency estimation in order to correct for equating bias. A simulation study was carried out to…

Descriptors: Computation, Bias, Comparative Analysis, Statistical Analysis

Detection of Answer Copying Based on the Structure of a High-Stakes Test

Peer reviewed

Direct link

Belov, Dmitry I. – Applied Psychological Measurement, 2011

This article presents the Variable Match Index (VM-Index), a new statistic for detecting answer copying. The power of the VM-Index relies on two-dimensional conditioning as well as the structure of the test. The asymptotic distribution of the VM-Index is analyzed by reduction to Poisson trials. A computational study comparing the VM-Index with the…

Descriptors: Cheating, Journal Articles, Computation, Comparative Analysis

Item Response Theory with Estimation of the Latent Density Using Davidian Curves

Peer reviewed

Direct link

Woods, Carol M.; Lin, Nan – Applied Psychological Measurement, 2009

Davidian-curve item response theory (DC-IRT) is introduced, evaluated with simulations, and illustrated using data from the Schedule for Nonadaptive and Adaptive Personality Entitlement scale. DC-IRT is a method for fitting unidimensional IRT models with maximum marginal likelihood estimation, in which the latent density is estimated,…

Descriptors: Item Response Theory, Personality Measures, Computation, Simulation

Likelihood-Ratio DIF Testing: Effects of Nonnormality

Peer reviewed

Direct link

Woods, Carol M. – Applied Psychological Measurement, 2008

Differential item functioning (DIF) occurs when an item has different measurement properties for members of one group versus another. Likelihood-ratio (LR) tests for DIF based on item response theory (IRT) involve statistically comparing IRT models that vary with respect to their constraints. A simulation study evaluated how violation of the…

Descriptors: Simulation, Item Response Theory, Comparative Analysis, Statistics

Three Classes of Nonparametric Differential Step Functioning Effect Estimators

Peer reviewed

Direct link

Penfield, Randall D. – Applied Psychological Measurement, 2008

The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…

Descriptors: Simulation, Nonparametric Statistics, Item Response Theory, Computation

The Multiple-Choice Model: Some Solutions for Estimation of Parameters in the Presence of Omitted Responses

Peer reviewed

Direct link

Abad, Francisco J.; Olea, Julio; Ponsoda, Vicente – Applied Psychological Measurement, 2009

This article deals with some of the problems that have hindered the application of Samejima's and Thissen and Steinberg's multiple-choice models: (a) parameter estimation difficulties owing to the large number of parameters involved, (b) parameter identifiability problems in the Thissen and Steinberg model, and (c) their treatment of omitted…

Descriptors: Multiple Choice Tests, Models, Computation, Simulation

Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

Peer reviewed

Direct link

Cui, Zhongmin; Kolen, Michael J. – Applied Psychological Measurement, 2008

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

Descriptors: Test Length, Test Content, Simulation, Computation

Woods, Carol M.	2
Abad, Francisco J.	1
Almehrizi, Rashid S.	1
Belov, Dmitry I.	1
Brennan, Robert L.	1
Chen, Li-Ju	1
Cui, Zhongmin	1
Culpepper, Steven Andrew	1
Divers, Jasmin	1
Finkelman, Matthew D.	1
Ho, Rong-Guey	1
Kieftenbeld, Vincent	1
Kim, Seonghoon	1
Kim, Wonsuk	1
Kolen, Michael J.	1
Kuo, Ching-Chin	1
Laio, Wen-Wei	1
Lin, Nan	1
Nandakumar, Ratna	1
Natesan, Prathiba	1
Newton, Matthew	1
Olea, Julio	1
Padilla, Miguel A.	1
Penfield, Randall D.	1
Ponsoda, Vicente	1
More ▼