ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	5
Since 2017 (last 10 years)	22
Since 2007 (last 20 years)	36

Descriptor

Computation	42
Item Response Theory	23
Test Items	21
Scores	10
Comparative Analysis	9
Accuracy	8
Error of Measurement	8
Difficulty Level	7
Statistical Analysis	7
Classification	6
Evaluation Methods	6
Foreign Countries	6
Mathematics Tests	6
Methods	6
Reliability	6
Simulation	6
Test Bias	6
Ability	5
Bayesian Statistics	5
Equated Scores	5
Grade 8	5
Measurement	5
Models	5
Correlation	4
Maximum Likelihood Statistics	4
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	42
Reports - Research	37
Reports - Evaluative	5

Education Level

Secondary Education	6
Middle Schools	5
Elementary Education	4
Junior High Schools	4
Early Childhood Education	3
Grade 8	3
Elementary Secondary Education	2
Grade 3	2
Grade 4	2
Grade 5	2
Grade 6	2
Grade 7	2
Higher Education	2
Intermediate Grades	2
Primary Education	2
Grade 1	1
Grade 12	1
Grade 2	1
Postsecondary Education	1
Preschool Education	1
More ▼

Audience

Practitioners	2
Researchers	1

Location

New York	2
Australia	1
California (Los Angeles)	1
Colorado	1
Florida	1
Iran	1
Netherlands	1
North Carolina	1
Oman	1
Tennessee	1
Texas	1
United Kingdom	1
More ▼

Laws, Policies, & Programs

Assessments and Surveys

Trends in International…	3
Program for International…	2
Advanced Placement…	1
Bar Examinations	1
Iowa Tests of Basic Skills	1
National Assessment of…	1
Progress in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 42 results Save | Export

The Impact of Non-Effortful Responding on Item and Person Parameters in Item-Pool Scaling Linking

Peer reviewed

Direct link

Yue Liu; Zhen Li; Hongyun Liu; Xiaofeng You – Applied Measurement in Education, 2024

Low test-taking effort of examinees has been considered a source of construct-irrelevant variance in item response modeling, leading to serious consequences on parameter estimation. This study aims to investigate how non-effortful response (NER) influences the estimation of item and person parameters in item-pool scale linking (IPSL) and whether…

Descriptors: Item Response Theory, Computation, Simulation, Responses

Bayesian Logistic Regression: A New Method to Calibrate Pretest Items in Multistage Adaptive Testing

Peer reviewed

Direct link

TsungHan Ho – Applied Measurement in Education, 2023

An operational multistage adaptive test (MST) requires the development of a large item bank and the effort to continuously replenish the item bank due to concerns about test security and validity over the long term. New items should be pretested and linked to the item bank before being used operationally. The linking item volume fluctuations in…

Descriptors: Bayesian Statistics, Regression (Statistics), Test Items, Pretesting

Tracking Ordinal Development of Skills with a Longitudinal DINA Model with Polytomous Attributes

Peer reviewed

Direct link

Zhan, Peida; Liu, Yaohui; Yu, Zhaohui; Pan, Yanfang – Applied Measurement in Education, 2023

Many educational and psychological studies have shown that the development of students is generally step-by-step (i.e. ordinal development) to a specific level. This study proposed a novel longitudinal learning diagnosis model with polytomous attributes to track students' ordinal development in learning. Using the concept of polytomous attributes…

Descriptors: Skill Development, Cognitive Measurement, Models, Educational Diagnosis

Detecting Item Parameter Drift in Small Sample Rasch Equating

Peer reviewed

Direct link

Daniel Jurich; Chunyan Liu – Applied Measurement in Education, 2023

Screening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate with small sample sizes, few studies have investigated methods to detect drift in small sample equating. This study demonstrates that several newly…

Descriptors: High Stakes Tests, Sample Size, Item Response Theory, Equated Scores

Violation of Conditional Independence in the Many-Facets Rasch Model

Peer reviewed

Direct link

DeMars, Christine E. – Applied Measurement in Education, 2021

Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…

Descriptors: Item Response Theory, Test Items, Ability, Scores

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

Comparing Cut Scores from the Angoff Method and Two Variations of the Hofstee and Beuk Methods

Peer reviewed

Direct link

Wyse, Adam E. – Applied Measurement in Education, 2020

This article compares cut scores from two variations of the Hofstee and Beuk methods, which determine cut scores by resolving inconsistencies in panelists' judgments about cut scores and pass rates, with the Angoff method. The first variation uses responses to the Hofstee and Beuk percentage correct and pass rate questions to calculate cut scores.…

Descriptors: Cutting Scores, Evaluation Methods, Standard Setting (Scoring), Equations (Mathematics)

Detecting Local Dependence: A Threshold-Autoregressive Item Response Theory (TAR-IRT) Approach for Polytomous Items

Peer reviewed

Direct link

Tang, Xiaodan; Karabatsos, George; Chen, Haiqin – Applied Measurement in Education, 2020

In applications of item response theory (IRT) models, it is known that empirical violations of the local independence (LI) assumption can significantly bias parameter estimates. To address this issue, we propose a threshold-autoregressive item response theory (TAR-IRT) model that additionally accounts for order dependence among the item responses…

Descriptors: Item Response Theory, Test Items, Models, Computation

Coefficient [beta] as Extension of KR-21 Reliability for Summed and Scaled Scores for Polytomously-Scored Tests

Peer reviewed

Direct link

Almehrizi, Rashid S. – Applied Measurement in Education, 2021

KR-21 reliability and its extension (coefficient [alpha]) gives the reliability estimate of test scores under the assumption of tau-equivalent forms. KR-21 reliability gives the reliability estimate for summed scores for dichotomous items when items are randomly sampled from an infinite pool of similar items (randomly parallel forms). The article…

Descriptors: Test Reliability, Scores, Scoring, Computation

Bayesian Estimation and Testing of a Linear Logistic Test Model for Learning during the Test

Peer reviewed

Direct link

Lozano, José H.; Revuelta, Javier – Applied Measurement in Education, 2021

The present study proposes a Bayesian approach for estimating and testing the operation-specific learning model, a variant of the linear logistic test model that allows for the measurement of the learning that occurs during a test as a result of the repeated use of the operations involved in the items. The advantages of using a Bayesian framework…

Descriptors: Bayesian Statistics, Computation, Learning, Testing

Efficient Estimation of Mean Ability Growth Using Vertical Scaling

Peer reviewed

Direct link

Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021

In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…

Descriptors: Scaling, Ability, Computation, Test Items

Measuring the Reliability of Diagnostic Mastery Classifications at Multiple Levels of Reporting

Peer reviewed

Direct link

Thompson, W. Jake; Clark, Amy K.; Nash, Brooke – Applied Measurement in Education, 2019

As the use of diagnostic assessment systems transitions from research applications to large-scale assessments for accountability purposes, reliability methods that provide evidence at each level of reporting are needed. The purpose of this paper is to summarize one simulation-based method for estimating and reporting reliability for an…

Descriptors: Test Reliability, Diagnostic Tests, Classification, Computation

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model

Peer reviewed

Direct link

Zhang, Zhonghua – Applied Measurement in Education, 2020

The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…

Descriptors: Error of Measurement, Computation, Equated Scores, True Scores

Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data

Peer reviewed

Direct link

Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2019

In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates…

Descriptors: Item Response Theory, Item Banks, Test Items, Computation

Evaluating Random and Systematic Error in Student Growth Percentiles

Peer reviewed

Direct link

Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020

Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…

Descriptors: Growth Models, Reliability, Scores, Error Patterns

Previous Page | Next Page »

Pages: 1 | 2 | 3

Haberman, Shelby	2
Lee, Won-Chan	2
Paek, Insu	2
Sinharay, Sandip	2
Alahmadi, Sarah	1
Almehrizi, Rashid S.	1
Andrich, David	1
Ansley, Timothy N.	1
Barry, Carol L.	1
Beretvas, S. Natasha	1
Bjermo, Jonas	1
Camara, Wayne J.	1
Cascallar, Alicia S.	1
Chen, Haiqin	1
Chen, Lisue	1
Chis, Liliana	1
Cho, Sun-Joo	1
Chunyan Liu	1
Clark, Amy K.	1
Clauser, Brian E.	1
Daniel Jurich	1
DeMars, Christine	1
DeMars, Christine E.	1
Dodd, Barbara G.	1
Ercikan, Kadriye	1
More ▼