ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	4
Since 2007 (last 20 years)	10

Source

Applied Measurement in…

Publication Type

Journal Articles	10
Reports - Research	9
Reports - Evaluative	1
Tests/Questionnaires	1

Education Level

Elementary Education	2
Grade 5	2
High Schools	2
Higher Education	2
Intermediate Grades	2
Middle Schools	2
Early Childhood Education	1
Grade 10	1
Grade 11	1
Grade 12	1
Grade 3	1
Grade 4	1
Grade 6	1
Postsecondary Education	1
Primary Education	1
Secondary Education	1
More ▼

Audience

Location

Arizona	2
California (Los Angeles)	1
Canada	1
Massachusetts	1

Laws, Policies, & Programs

Assessments and Surveys

National Assessment of…

What Works Clearinghouse Rating

Showing all 10 results Save | Export

Gender Differences and Similarities in High School Science Performance--What Do Item Response Patterns Tell Us?

Peer reviewed

Direct link

Yiling Cheng; I-Chien Chen; Barbara Schneider; Mark Reckase; Joseph Krajcik – Applied Measurement in Education, 2024

The current study expands on previous research on gender differences and similarities in science test scores. Using three different approaches -- differential item functioning, differential distractor functioning, and decision tree analysis -- we examine a high school science assessment administered to 3,849 10th-12th graders, of whom 2,021 are…

Descriptors: Gender Differences, Science Achievement, Responses, Testing

Dissecting Knowledge, Guessing, and Blunder in Multiple Choice Assessments

Peer reviewed

Direct link

Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023

Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…

Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models

Evaluating Random and Systematic Error in Student Growth Percentiles

Peer reviewed

Direct link

Wells, Craig S.; Sireci, Stephen G. – Applied Measurement in Education, 2020

Student growth percentiles (SGPs) are currently used by several states and school districts to provide information about individual students as well as to evaluate teachers, schools, and school districts. For SGPs to be defensible for these purposes, they should be reliable. In this study, we examine the amount of systematic and random error in…

Descriptors: Growth Models, Reliability, Scores, Error Patterns

Evaluating Score and Decision Consistency across Claims in a Validation Argument

Peer reviewed

Direct link

Schmidgall, Jonathan – Applied Measurement in Education, 2017

This study utilizes an argument-based approach to validation to examine the implications of reliability in order to further differentiate the concepts of score and decision consistency. In a methodological example, the framework of generalizability theory was used to estimate appropriate indices of score consistency and evaluations of the…

Descriptors: Scores, Reliability, Validity, Generalizability Theory

The Effect of Changing Content on IRT Scaling Methods

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015

Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…

Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics

A Substantive Process Analysis of Responses to Items from the Multistate Bar Examination

Peer reviewed

Direct link

Bonner, Sarah M.; D'Agostino, Jerome V. – Applied Measurement in Education, 2012

We investigated examinees' cognitive processes while they solved selected items from the Multistate Bar Exam (MBE), a high-stakes professional certification examination. We focused on ascertaining those mental processes most frequently used by examinees, and the most common types of errors in their thinking. We compared the relationships between…

Descriptors: Cognitive Processes, Test Items, Problem Solving, Thinking Skills

Science Assessments and English Language Learners: Validity Evidence Based on Response Processes

Peer reviewed

Direct link

Noble, Tracy; Rosebery, Ann; Suarez, Catherine; Warren, Beth; O'Connor, Mary Catherine – Applied Measurement in Education, 2014

English language learners (ELLs) and their teachers, schools, and communities face increasingly high-stakes consequences due to test score gaps between ELLs and non-ELLs. It is essential that the field of educational assessment continue to investigate the meaning of these test score gaps. This article discusses the findings of an exploratory study…

Descriptors: English Language Learners, Evidence, Educational Assessment, Achievement Gap

Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program

Peer reviewed

Direct link

Puhan, Gautam – Applied Measurement in Education, 2009

The purpose of this study is to determine the extent of scale drift on a test that employs cut scores. It was essential to examine scale drift for this testing program because new forms in this testing program are often put on scale through a series of intermediate equatings (known as equating chains). This process may cause equating error to…

Descriptors: Testing Programs, Testing, Measurement Techniques, Item Response Theory

Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory

Peer reviewed

Direct link

Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008

Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…

Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics

The Rating and Matching Item-Objective Alignment Methods

Peer reviewed

Direct link

D'Agostino, Jerome V.; Welsh, Megan E.; Cimetta, Adriana D.; Falco, Lia D.; Smith, Shannon; VanWinkle, Waverely Hester; Powers, Sonya J. – Applied Measurement in Education, 2008

Central to the standards-based assessment validation process is an examination of the alignment between state standards and test items. Several alignment analysis systems have emerged recently, but most rely on either traditional rating or matching techniques. Little, if any, analyses have been reported on the degree of consistency between the two…

Descriptors: Test Items, Student Evaluation, State Standards, Evaluation Methods

Error Patterns	10
Test Items	7
Scores	6
Evaluation Methods	3
Item Response Theory	3
Multiple Choice Tests	3
Computation	2
Correlation	2
Grade 5	2
Graduate Students	2
High School Students	2
Item Analysis	2
Knowledge Level	2
Measurement Techniques	2
Reliability	2
Responses	2
Student Evaluation	2
Test Construction	2
Testing	2
Thinking Skills	2
Academic Ability	1
Academic Achievement	1
Academic Standards	1
Accuracy	1
Achievement Gap	1
More ▼

D'Agostino, Jerome V.	2
Wells, Craig S.	2
Abu-Ghazalah, Rashid M.	1
Barbara Schneider	1
Bolt, Daniel M.	1
Bonner, Sarah M.	1
Cimetta, Adriana D.	1
Dubins, David N.	1
Falco, Lia D.	1
I-Chien Chen	1
Joseph Krajcik	1
Keller, Lisa A.	1
Keller, Robert R.	1
Mark Reckase	1
Noble, Tracy	1
O'Connor, Mary Catherine	1
Poon, Gregory M. K.	1
Powers, Sonya J.	1
Puhan, Gautam	1
Rosebery, Ann	1
Schmidgall, Jonathan	1
Sireci, Stephen G.	1
Smith, Shannon	1
Suarez, Catherine	1
VanWinkle, Waverely Hester	1
More ▼