ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	1
Since 2016 (last 10 years)	3
Since 2006 (last 20 years)	10

Source

Applied Measurement in…

Publication Type

Journal Articles	10
Reports - Research	8
Reports - Evaluative	2

Education Level

Grade 7	10
Elementary Education	7
Grade 6	7
Junior High Schools	7
Middle Schools	7
Grade 4	6
Secondary Education	6
Grade 3	5
Grade 5	5
Grade 8	5
Intermediate Grades	5
Early Childhood Education	2
Elementary Secondary Education	2
Grade 9	2
High Schools	2
Primary Education	2
Grade 10	1
Grade 11	1
Grade 2	1
More ▼

Audience

Location

Australia	1
Colorado	1
Florida	1
Iowa	1
New York	1
North Carolina	1
Tennessee	1
Texas	1

Laws, Policies, & Programs

Assessments and Surveys

Iowa Tests of Basic Skills	1
Iowa Tests of Educational…	1
Measures of Academic Progress	1

What Works Clearinghouse Rating

Showing all 10 results Save | Export

A Validation Argument from Soup to Nuts: Assessing Progress on Learning Trajectories for Middle-School Mathematics

Peer reviewed

Direct link

Confrey, Jere; Toutkoushian, Emily; Shah, Meetal – Applied Measurement in Education, 2019

Fully articulating validation arguments in the context of classroom assessment requires connecting evidence from multiple sources and addressing multiple types of validity in a coherent chain of reasoning. This type of validation argument is particularly complex for assessments that function in close proximity to instruction, address the fine…

Descriptors: Test Validity, Item Response Theory, Middle School Students, Mathematics Instruction

Performance Decline as an Indicator of Generalized Test-Taking Disengagement

Peer reviewed

Direct link

Wise, Steven L.; Kingsbury, G. Gage – Applied Measurement in Education, 2022

In achievement testing we assume that students will demonstrate their maximum performance as they encounter test items. Sometimes, however, student performance can decline during a test event, which implies that the test score does not represent maximum performance. This study describes a method for identifying significant performance decline and…

Descriptors: Achievement Tests, Performance, Classification, Guessing (Tests)

Exploring the Robustness of a Unidimensional Item Response Theory Model with Empirically Multidimensional Data

Peer reviewed

Direct link

Anderson, Daniel; Kahn, Joshua D.; Tindal, Gerald – Applied Measurement in Education, 2017

Unidimensionality and local independence are two common assumptions of item response theory. The former implies that all items measure a common latent trait, while the latter implies that responses are independent, conditional on respondents' location on the latent trait. Yet, few tests are truly unidimensional. Unmodeled dimensions may result in…

Descriptors: Robustness (Statistics), Item Response Theory, Mathematics Tests, Grade 6

Requiring a Consistent Unit of Scale between the Responses of Students and Judges in Standard Setting

Peer reviewed

Direct link

Humphry, Stephen; Heldsinger, Sandra; Andrich, David – Applied Measurement in Education, 2014

One of the best-known methods for setting a benchmark standard on a test is that of Angoff and its modifications. When scored dichotomously, judges estimate the probability that a benchmark student has of answering each item correctly. As in most methods of standard setting, it is assumed implicitly that the unit of the latent scale of the…

Descriptors: Foreign Countries, Standard Setting (Scoring), Judges, Item Response Theory

A Comparison of Teacher Effectiveness Measures Calculated Using Three Multilevel Models for Raters Effects

Peer reviewed

Direct link

Murphy, Daniel L.; Beretvas, S. Natasha – Applied Measurement in Education, 2015

This study examines the use of cross-classified random effects models (CCrem) and cross-classified multiple membership random effects models (CCMMrem) to model rater bias and estimate teacher effectiveness. Effect estimates are compared using CTT versus item response theory (IRT) scaling methods and three models (i.e., conventional multilevel…

Descriptors: Teacher Effectiveness, Comparative Analysis, Hierarchical Linear Modeling, Test Theory

Considering the Use of General and Modified Assessment Items in Computerized Adaptive Testing

Peer reviewed

Direct link

Wyse, Adam E.; Albano, Anthony D. – Applied Measurement in Education, 2015

This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…

Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Testing Programs

Examining the Effectiveness of Test Accommodation Using DIF and a Mixture IRT Model

Peer reviewed

Direct link

Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal – Applied Measurement in Education, 2012

This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…

Descriptors: Testing Accommodations, Test Bias, Item Response Theory, Validity

Stability of Rasch Scales over Time

Peer reviewed

Direct link

Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2010

Item response theory (IRT) methods are generally used to create score scales for large-scale tests. Research has shown that IRT scales are stable across groups and over time. Most studies have focused on items that are dichotomously scored. Now Rasch and other IRT models are used to create scales for tests that include polytomously scored items.…

Descriptors: Measures (Individuals), Item Response Theory, Robustness (Statistics), Item Analysis

Comparisons of Methodologies and Results in Vertical Scaling for Educational Achievement Tests

Peer reviewed

Direct link

Tong, Ye; Kolen, Michael J. – Applied Measurement in Education, 2007

A number of vertical scaling methodologies were examined in this article. Scaling variations included data collection design, scaling method, item response theory (IRT) scoring procedure, and proficiency estimation method. Vertical scales were developed for Grade 3 through Grade 8 for 4 content areas and 9 simulated datasets. A total of 11 scaling…

Descriptors: Achievement Tests, Scaling, Methods, Item Response Theory

Sex Differences in the Tendency to Omit Items on Multiple-Choice Tests: 1980-2000

Peer reviewed

Direct link

von Schrader, Sarah; Ansley, Timothy – Applied Measurement in Education, 2006

Much has been written concerning the potential group differences in responding to multiple-choice achievement test items. This discussion has included references to possible disparities in tendency to omit such test items. When test scores are used for high-stakes decision making, even small differences in scores and rankings that arise from male…

Descriptors: Gender Differences, Multiple Choice Tests, Achievement Tests, Grade 3

Grade 7	8
Item Response Theory	8
Grade 6	6
Grade 3	4
Grade 4	4
Grade 5	4
Grade 8	4
Achievement Tests	3
Mathematics Tests	3
Test Items	3
Comparative Analysis	2
Computation	2
Grade 9	2
Item Analysis	2
Middle School Teachers	2
Reading Tests	2
Robustness (Statistics)	2
Scaling	2
Student Characteristics	2
Test Validity	2
Testing Accommodations	2
Academic Ability	1
Adaptive Testing	1
Affective Behavior	1
Alternative Assessment	1
More ▼

Albano, Anthony D.	1
Anderson, Daniel	1
Andrich, David	1
Ansley, Timothy	1
Beretvas, S. Natasha	1
Cho, Hyun-Jeong	1
Confrey, Jere	1
Heldsinger, Sandra	1
Humphry, Stephen	1
Kahn, Joshua D.	1
Kingsbury, G. Gage	1
Kingston, Neal	1
Kolen, Michael J.	1
Lee, Jaehoon	1
Lee, Yoonsun	1
Murphy, Daniel L.	1
Shah, Meetal	1
Taylor, Catherine S.	1
Tindal, Gerald	1
Tong, Ye	1
Toutkoushian, Emily	1
Wise, Steven L.	1
Wyse, Adam E.	1
von Schrader, Sarah	1
More ▼