Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 4 |
| Since 2017 (last 10 years) | 20 |
| Since 2007 (last 20 years) | 56 |
Descriptor
| Computation | 62 |
| Test Length | 62 |
| Item Response Theory | 40 |
| Test Items | 29 |
| Sample Size | 26 |
| Accuracy | 20 |
| Simulation | 19 |
| Maximum Likelihood Statistics | 15 |
| Bayesian Statistics | 14 |
| Error of Measurement | 14 |
| Correlation | 12 |
| More ▼ | |
Source
Author
| Wang, Wen-Chung | 4 |
| Cheng, Ying | 3 |
| He, Wei | 2 |
| Kilic, Abdullah Faruk | 2 |
| Lathrop, Quinn N. | 2 |
| Lee, Yi-Hsuan | 2 |
| Liu, Chen-Wei | 2 |
| Paek, Insu | 2 |
| Zhang, Jinming | 2 |
| de la Torre, Jimmy | 2 |
| Atar, Burcu | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 52 |
| Reports - Research | 39 |
| Reports - Evaluative | 14 |
| Dissertations/Theses -… | 8 |
| Reports - Descriptive | 1 |
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
| Secondary Education | 2 |
| Early Childhood Education | 1 |
| Elementary Secondary Education | 1 |
| High Schools | 1 |
| Preschool Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| National Assessment of… | 1 |
| National Longitudinal Study… | 1 |
| Program for International… | 1 |
| Trends in International… | 1 |
What Works Clearinghouse Rating
Yildiz, Mustafa – ProQuest LLC, 2017
Student misconceptions have been studied for decades from a curricular/instructional perspective and from the assessment/test level perspective. Numerous misconception assessment tools have been developed in order to measure students' misconceptions relative to the correct content. Often, these tools are used to make a variety of educational…
Descriptors: Misconceptions, Students, Item Response Theory, Models
Paek, Insu – Educational and Psychological Measurement, 2016
The effect of guessing on the point estimate of coefficient alpha has been studied in the literature, but the impact of guessing and its interactions with other test characteristics on the interval estimators for coefficient alpha has not been fully investigated. This study examined the impact of guessing and its interactions with other test…
Descriptors: Guessing (Tests), Computation, Statistical Analysis, Test Length
Sinharay, Sandip – Applied Measurement in Education, 2017
Karabatsos compared the power of 36 person-fit statistics using receiver operating characteristics curves and found the "H[superscript T]" statistic to be the most powerful in identifying aberrant examinees. He found three statistics, "C", "MCI", and "U3", to be the next most powerful. These four statistics,…
Descriptors: Nonparametric Statistics, Goodness of Fit, Simulation, Comparative Analysis
Byram, Jessica N.; Seifert, Mark F.; Brooks, William S.; Fraser-Cotlin, Laura; Thorp, Laura E.; Williams, James M.; Wilson, Adam B. – Anatomical Sciences Education, 2017
With integrated curricula and multidisciplinary assessments becoming more prevalent in medical education, there is a continued need for educational research to explore the advantages, consequences, and challenges of integration practices. This retrospective analysis investigated the number of items needed to reliably assess anatomical knowledge in…
Descriptors: Anatomy, Science Tests, Test Items, Test Reliability
Lee, HyeSun – Applied Measurement in Education, 2018
The current simulation study examined the effects of Item Parameter Drift (IPD) occurring in a short scale on parameter estimates in multilevel models where scores from a scale were employed as a time-varying predictor to account for outcome scores. Five factors, including three decisions about IPD, were considered for simulation conditions. It…
Descriptors: Test Items, Hierarchical Linear Modeling, Predictor Variables, Scores
Lee, Yi-Hsuan; Zhang, Jinming – International Journal of Testing, 2017
Simulations were conducted to examine the effect of differential item functioning (DIF) on measurement consequences such as total scores, item response theory (IRT) ability estimates, and test reliability in terms of the ratio of true-score variance to observed-score variance and the standard error of estimation for the IRT ability parameter. The…
Descriptors: Test Bias, Test Reliability, Performance, Scores
Gelfand, Jessica T.; Christie, Robert E.; Gelfand, Stanley A. – Journal of Speech, Language, and Hearing Research, 2014
Purpose: Speech recognition may be analyzed in terms of recognition probabilities for perceptual wholes (e.g., words) and parts (e.g., phonemes), where j or the j-factor reveals the number of independent perceptual units required for recognition of the whole (Boothroyd, 1968b; Boothroyd & Nittrouer, 1988; Nittrouer & Boothroyd, 1990). For…
Descriptors: Phonemes, Word Recognition, Vowels, Syllables
Lathrop, Quinn N.; Cheng, Ying – Journal of Educational Measurement, 2014
When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Descriptors: Cutting Scores, Classification, Computation, Nonparametric Statistics
Lathrop, Quinn N.; Cheng, Ying – Applied Psychological Measurement, 2013
Within the framework of item response theory (IRT), there are two recent lines of work on the estimation of classification accuracy (CA) rate. One approach estimates CA when decisions are made based on total sum scores, the other based on latent trait estimates. The former is referred to as the Lee approach, and the latter, the Rudner approach,…
Descriptors: Item Response Theory, Accuracy, Classification, Computation
Wu, Yi-Fang – ProQuest LLC, 2015
Item response theory (IRT) uses a family of statistical models for estimating stable characteristics of items and examinees and defining how these characteristics interact in describing item and test performance. With a focus on the three-parameter logistic IRT (Birnbaum, 1968; Lord, 1980) model, the current study examines the accuracy and…
Descriptors: Item Response Theory, Test Items, Accuracy, Computation
Lei, Pui-Wa; Zhao, Yu – Applied Psychological Measurement, 2012
Vertical scaling is necessary to facilitate comparison of scores from test forms of different difficulty levels. It is widely used to enable the tracking of student growth in academic performance over time. Most previous studies on vertical scaling methods assume relatively long tests and large samples. Little is known about their performance when…
Descriptors: Scaling, Item Response Theory, Test Length, Sample Size
Svetina, Dubravka – Educational and Psychological Measurement, 2013
The purpose of this study was to investigate the effect of complex structure on dimensionality assessment in noncompensatory multidimensional item response models using dimensionality assessment procedures based on DETECT (dimensionality evaluation to enumerate contributing traits) and NOHARM (normal ogive harmonic analysis robust method). Five…
Descriptors: Item Response Theory, Statistical Analysis, Computation, Test Length
Pfeiffer, Nils; Hagemann, Dirk; Backenstrass, Matthias – Educational and Psychological Measurement, 2011
In response to the low standards in short form development, Smith, McCarthy, and Anderson (2000) introduced a set of guidelines for the construction and evaluation of short forms of psychological tests. One of their recommendations requires researches to show that the variance overlap between the short form and its long form is adequate. This…
Descriptors: Psychological Testing, Computation, Test Length, Undergraduate Students
Wang, Chun – Educational and Psychological Measurement, 2013
Cognitive diagnostic computerized adaptive testing (CD-CAT) purports to combine the strengths of both CAT and cognitive diagnosis. Cognitive diagnosis models aim at classifying examinees into the correct mastery profile group so as to pinpoint the strengths and weakness of each examinee whereas CAT algorithms choose items to determine those…
Descriptors: Computer Assisted Testing, Adaptive Testing, Cognitive Tests, Diagnostic Tests
Wang, Wen-Chung; Liu, Chen-Wei; Wu, Shiu-Lien – Applied Psychological Measurement, 2013
The random-threshold generalized unfolding model (RTGUM) was developed by treating the thresholds in the generalized unfolding model as random effects rather than fixed effects to account for the subjective nature of the selection of categories in Likert items. The parameters of the new model can be estimated with the JAGS (Just Another Gibbs…
Descriptors: Computer Assisted Testing, Adaptive Testing, Models, Bayesian Statistics

Direct link
Peer reviewed
