Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 5 |
| Since 2017 (last 10 years) | 22 |
| Since 2007 (last 20 years) | 36 |
Descriptor
| Computation | 42 |
| Item Response Theory | 23 |
| Test Items | 21 |
| Scores | 10 |
| Comparative Analysis | 9 |
| Accuracy | 8 |
| Error of Measurement | 8 |
| Difficulty Level | 7 |
| Statistical Analysis | 7 |
| Classification | 6 |
| Evaluation Methods | 6 |
| More ▼ | |
Source
| Applied Measurement in… | 42 |
Author
| Haberman, Shelby | 2 |
| Lee, Won-Chan | 2 |
| Paek, Insu | 2 |
| Sinharay, Sandip | 2 |
| Alahmadi, Sarah | 1 |
| Almehrizi, Rashid S. | 1 |
| Andrich, David | 1 |
| Ansley, Timothy N. | 1 |
| Barry, Carol L. | 1 |
| Beretvas, S. Natasha | 1 |
| Bjermo, Jonas | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 42 |
| Reports - Research | 37 |
| Reports - Evaluative | 5 |
Education Level
| Secondary Education | 6 |
| Middle Schools | 5 |
| Elementary Education | 4 |
| Junior High Schools | 4 |
| Early Childhood Education | 3 |
| Grade 8 | 3 |
| Elementary Secondary Education | 2 |
| Grade 3 | 2 |
| Grade 4 | 2 |
| Grade 5 | 2 |
| Grade 6 | 2 |
| More ▼ | |
Audience
| Practitioners | 2 |
| Researchers | 1 |
Location
| New York | 2 |
| Australia | 1 |
| California (Los Angeles) | 1 |
| Colorado | 1 |
| Florida | 1 |
| Iran | 1 |
| Netherlands | 1 |
| North Carolina | 1 |
| Oman | 1 |
| Tennessee | 1 |
| Texas | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
| Trends in International… | 3 |
| Program for International… | 2 |
| Advanced Placement… | 1 |
| Bar Examinations | 1 |
| Iowa Tests of Basic Skills | 1 |
| National Assessment of… | 1 |
| Progress in International… | 1 |
What Works Clearinghouse Rating
Yue Liu; Zhen Li; Hongyun Liu; Xiaofeng You – Applied Measurement in Education, 2024
Low test-taking effort of examinees has been considered a source of construct-irrelevant variance in item response modeling, leading to serious consequences on parameter estimation. This study aims to investigate how non-effortful response (NER) influences the estimation of item and person parameters in item-pool scale linking (IPSL) and whether…
Descriptors: Item Response Theory, Computation, Simulation, Responses
Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing
TsungHan Ho – Applied Measurement in Education, 2023
An operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in…
Descriptors: Bayesian Statistics, Regression (Statistics), Test Items, Pretesting
Zhan, Peida; Liu, Yaohui; Yu, Zhaohui; Pan, Yanfang – Applied Measurement in Education, 2023
Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students' ordinal development in learning. Using the concept of polytomous attributes…
Descriptors: Skill Development, Cognitive Measurement, Models, Educational Diagnosis
Daniel Jurich; Chunyan Liu – Applied Measurement in Education, 2023
Screening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate with small sample sizes, few studies have investigated methods to detect drift in small sample equating. This study demonstrates that several newly…
Descriptors: High Stakes Tests, Sample Size, Item Response Theory, Equated Scores
DeMars, Christine E. – Applied Measurement in Education, 2021
Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…
Descriptors: Item Response Theory, Test Items, Ability, Scores
Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023
Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…
Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items
Wyse, Adam E. – Applied Measurement in Education, 2020
This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…
Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)
Tang, Xiaodan; Karabatsos, George; Chen, Haiqin – Applied Measurement in Education, 2020
In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses…
Descriptors: Item Response Theory, Test Items, Models, Computation
Almehrizi, Rashid S. – Applied Measurement in Education, 2021
KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…
Descriptors: Test Reliability, Scores, Scoring, Computation
Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021
The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…
Descriptors: Bayesian Statistics, Computation, Learning, Testing
Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021
In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…
Descriptors: Scaling, Ability, Computation, Test Items
Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019
As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…
Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation
Zhang, Zhonghua – Applied Measurement in Education, 2020
The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…
Descriptors: Error of Measurement, Computation, Equated Scores, True Scores
Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2019
In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates…
Descriptors: Item Response Theory, Item Banks, Test Items, Computation
Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020
Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…
Descriptors: Growth Models, Reliability, Scores, Error Patterns

Peer reviewed
Direct link
