ERIC - Search Results

Publication Date

In 2025	0
Since 2024	3
Since 2021 (last 5 years)	7
Since 2016 (last 10 years)	12
Since 2006 (last 20 years)	16

Descriptor

Scores	30
Item Response Theory	25
Test Items	17
Responses	8
Test Construction	8
Multiple Choice Tests	6
Test Results	6
Comparative Analysis	5
Reliability	5
Simulation	5
Error of Measurement	4
Identification	4
Models	4
Accuracy	3
Computation	3
Computer Assisted Testing	3
Correlation	3
Error Patterns	3
Guessing (Tests)	3
Item Analysis	3
Nonparametric Statistics	3
Reaction Time	3
Research Methodology	3
Scaling	3
Statistical Analysis	3
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	30
Reports - Research	16
Reports - Evaluative	14
Speeches/Meeting Papers	1
Tests/Questionnaires	1

Education Level

Higher Education	4
Postsecondary Education	4
Grade 11	2
Grade 8	2
Secondary Education	2
Elementary Education	1
Grade 10	1
Grade 12	1
High Schools	1
Junior High Schools	1
Middle Schools	1
More ▼

Audience

Location

Canada	1
Texas	1
Virginia	1

Laws, Policies, & Programs

Assessments and Surveys

Advanced Placement…	1
National Assessment of…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 30 results Save | Export

Comparing Examinee-Based and Response-Based Motivation Filtering Methods in Remote Low-Stakes Testing

Peer reviewed

Direct link

Sarah Alahmadi; Christine E. DeMars – Applied Measurement in Education, 2024

Large-scale educational assessments are sometimes considered low-stakes, increasing the possibility of confounding true performance level with low motivation. These concerns are amplified in remote testing conditions. To remove the effects of low effort levels in responses observed in remote low-stakes testing, several motivation filtering methods…

Descriptors: Multiple Choice Tests, Item Response Theory, College Students, Scores

New Tests of Rater Drift in Trend Scoring

Peer reviewed

Direct link

John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024

Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…

Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics

Violation of Conditional Independence in the Many-Facets Rasch Model

Peer reviewed

Direct link

DeMars, Christine E. – Applied Measurement in Education, 2021

Estimation of parameters for the many-facets Rasch model requires that conditional on the values of the facets, such as person ability, item difficulty, and rater severity, the observed responses within each facet are independent. This requirement has often been discussed for the Rasch models and 2PL and 3PL models, but it becomes more complex…

Descriptors: Item Response Theory, Test Items, Ability, Scores

Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation

Peer reviewed

Direct link

Song, Yoon Ah; Lee, Won-Chan – Applied Measurement in Education, 2022

This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of…

Descriptors: Item Response Theory, Item Analysis, Scores, Accuracy

Using Think-Alouds for Response Process Evidence of Teacher Attentiveness

Peer reviewed

Direct link

Mo, Ya; Carney, Michele; Cavey, Laurie; Totorica, Tatia – Applied Measurement in Education, 2021

There is a need for assessment items that assess complex constructs but can also be efficiently scored for evaluation of teacher education programs. In an effort to measure the construct of teacher attentiveness in an efficient and scalable manner, we are using exemplar responses elicited by constructed-response item prompts to develop…

Descriptors: Protocol Analysis, Test Items, Responses, Mathematics Teachers

Gender Differences and Similarities in High School Science Performance--What Do Item Response Patterns Tell Us?

Peer reviewed

Direct link

Yiling Cheng; I-Chien Chen; Barbara Schneider; Mark Reckase; Joseph Krajcik – Applied Measurement in Education, 2024

The current study expands on previous research on gender differences and similarities in science test scores. Using three different approaches -- differential item functioning, differential distractor functioning, and decision tree analysis -- we examine a high school science assessment administered to 3,849 10th-12th graders, of whom 2,021 are…

Descriptors: Gender Differences, Science Achievement, Responses, Testing

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Impact of Item Parameter Drift on Rasch Scale Stability in Small Samples over Multiple Administrations

Peer reviewed

Direct link

Kopp, Jason P.; Jones, Andrew T. – Applied Measurement in Education, 2020

Traditional psychometric guidelines suggest that at least several hundred respondents are needed to obtain accurate parameter estimates under the Rasch model. However, recent research indicates that Rasch equating results in accurate parameter estimates with sample sizes as small as 25. Item parameter drift under the Rasch model has been…

Descriptors: Item Response Theory, Psychometrics, Sample Size, Sampling

A New Procedure for Detection of Students' Rapid Guessing Responses Using Response Time

Peer reviewed

Direct link

Guo, Hongwen; Rios, Joseph A.; Haberman, Shelby; Liu, Ou Lydia; Wang, Jing; Paek, Insu – Applied Measurement in Education, 2016

Unmotivated test takers using rapid guessing in item responses can affect validity studies and teacher and institution performance evaluation negatively, making it critical to identify these test takers. The authors propose a new nonparametric method for finding response-time thresholds for flagging item responses that result from rapid-guessing…

Descriptors: Guessing (Tests), Reaction Time, Nonparametric Statistics, Models

Differential Item Functioning for Accommodated Students with Disabilities: Effect of Differences in Proficiency Distributions

Peer reviewed

Direct link

Quesen, Sarah; Lane, Suzanne – Applied Measurement in Education, 2019

This study examined the effect of similar vs. dissimilar proficiency distributions on uniform DIF detection on a statewide eighth grade mathematics assessment. Results from the similar- and dissimilar-ability reference groups with an SWD focal group were compared for four models: logistic regression, hierarchical generalized linear model (HGLM),…

Descriptors: Test Items, Mathematics Tests, Grade 8, Item Response Theory

Using the Bayes Factors to Evaluate Person Fit in the Item Response Theory

Peer reviewed

Direct link

Pan, Tianshu; Yin, Yue – Applied Measurement in Education, 2017

In this article, we propose using the Bayes factors (BF) to evaluate person fit in item response theory models under the framework of Bayesian evaluation of an informative diagnostic hypothesis. We first discuss the theoretical foundation for this application and how to analyze person fit using BF. To demonstrate the feasibility of this approach,…

Descriptors: Bayesian Statistics, Goodness of Fit, Item Response Theory, Monte Carlo Methods

The Consequences of Ignoring Item Parameter Drift in Longitudinal Item Response Models

Peer reviewed

Direct link

Lee, Wooyeol; Cho, Sun-Joo – Applied Measurement in Education, 2017

Utilizing a longitudinal item response model, this study investigated the effect of item parameter drift (IPD) on item parameters and person scores via a Monte Carlo study. Item parameter recovery was investigated for various IPD patterns in terms of bias and root mean-square error (RMSE), and percentage of time the 95% confidence interval covered…

Descriptors: Item Response Theory, Test Items, Bias, Computation

The Impact of Multidirectional Item Parameter Drift on IRT Scaling Coefficients and Proficiency Estimates

Peer reviewed

Direct link

Han, Kyung T.; Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2012

Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some…

Descriptors: Item Response Theory, Test Items, Scaling, Methods

Providing Subscale Scores for Diagnostic Information: A Case Study when the Test Is Essentially Unidimensional

Peer reviewed

Direct link

Stone, Clement A.; Ye, Feifei; Zhu, Xiaowen; Lane, Suzanne – Applied Measurement in Education, 2010

Although reliability of subscale scores may be suspect, subscale scores are the most common type of diagnostic information included in student score reports. This research compared methods for augmenting the reliability of subscale scores for an 8th-grade mathematics assessment. Yen's Objective Performance Index, Wainer et al.'s augmented scores,…

Descriptors: Item Response Theory, Case Studies, Reliability, Scores

Item-Level Comparative Analysis of Online and Paper Administrations of the Texas Assessment of Knowledge and Skills

Peer reviewed

Direct link

Keng, Leslie; McClarty, Katie Larsen; Davis, Laurie Laughlin – Applied Measurement in Education, 2008

This article describes a comparative study conducted at the item level for paper and online administrations of a statewide high stakes assessment. The goal was to identify characteristics of items that may have contributed to mode effects. Item-level analyses compared two modes of the Texas Assessment of Knowledge and Skills (TAKS) for up to four…

Descriptors: Computer Assisted Testing, Geometric Concepts, Grade 8, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2

Meijer, Rob R.	3
Lane, Suzanne	2
Wise, Steven L.	2
Abu-Ghazalah, Rashid M.	1
Andrich, David	1
Barbara Schneider	1
Candell, Gregory L.	1
Carney, Michele	1
Carol Eckerly	1
Cavey, Laurie	1
Cho, Sun-Joo	1
Christine E. DeMars	1
Clauser, Brian	1
Davis, Laurie Laughlin	1
DeMars, Christine E.	1
Dubins, David N.	1
Flannery, Wm. Peter	1
Guo, Hongwen	1
Haberman, Shelby	1
Han, Kyung T.	1
Heidorn, Mark	1
Hou, Liling	1
I-Chien Chen	1
John R. Donoghue	1
More ▼