ERIC - Search Results

Publication Date

In 2025	0
Since 2024	0
Since 2021 (last 5 years)	5
Since 2016 (last 10 years)	14
Since 2006 (last 20 years)	21

Descriptor

Equated Scores	34
Item Response Theory	31
Test Items	12
Sample Size	11
Statistical Analysis	11
Sampling	10
Error of Measurement	7
Evaluation Methods	7
True Scores	7
Difficulty Level	6
Simulation	6
Accuracy	5
Computation	5
College Entrance Examinations	4
Evaluation Criteria	4
Test Format	4
High School Students	3
High Stakes Tests	3
Licensing Examinations…	3
Mathematics Tests	3
Multiple Choice Tests	3
Psychometrics	3
Research Design	3
Research Methodology	3
Classification	2
More ▼

Source

Applied Measurement in…

Publication Type

Journal Articles	34
Reports - Research	23
Reports - Evaluative	11
Information Analyses	1
Speeches/Meeting Papers	1

Education Level

Higher Education	2
Postsecondary Education	2
Secondary Education	2
Early Childhood Education	1
Grade 10	1
Grade 4	1
Grade 7	1
Grade 8	1
High Schools	1
Junior High Schools	1
Middle Schools	1
Preschool Education	1
More ▼

Audience

Practitioners

Location

California

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment	1
College Board Achievement…	1
Medical College Admission Test	1
Program for International…	1
SAT (College Admission Test)	1

What Works Clearinghouse Rating

Showing 1 to 15 of 34 results Save | Export

Detecting Item Parameter Drift in Small Sample Rasch Equating

Peer reviewed

Direct link

Daniel Jurich; Chunyan Liu – Applied Measurement in Education, 2023

Screening items for parameter drift helps protect against serious validity threats and ensure score comparability when equating forms. Although many high-stakes credentialing examinations operate with small sample sizes, few studies have investigated methods to detect drift in small sample equating. This study demonstrates that several newly…

Descriptors: High Stakes Tests, Sample Size, Item Response Theory, Equated Scores

Comparing Drift Detection Methods for Accurate Rasch Equating in Different Sample Sizes

Peer reviewed

Direct link

Alahmadi, Sarah; Jones, Andrew T.; Barry, Carol L.; Ibáñez, Beatriz – Applied Measurement in Education, 2023

Rasch common-item equating is often used in high-stakes testing to maintain equivalent passing standards across test administrations. If unaddressed, item parameter drift poses a major threat to the accuracy of Rasch common-item equating. We compared the performance of well-established and newly developed drift detection methods in small and large…

Descriptors: Equated Scores, Item Response Theory, Sample Size, Test Items

Detection of Outliers in Anchor Items Using Modified Rasch Fit Statistics

Peer reviewed

Direct link

Liu, Chunyan; Jurich, Daniel; Morrison, Carol; Grabovsky, Irina – Applied Measurement in Education, 2021

The existence of outliers in the anchor items can be detrimental to the estimation of examinee ability and undermine the validity of score interpretation across forms. However, in practice, anchor item performance can become distorted due to various reasons. This study compares the performance of modified "INFIT" and "OUTFIT"…

Descriptors: Equated Scores, Test Items, Item Response Theory, Difficulty Level

Some Methods and Evaluation for Linking and Equating with Small Samples

Peer reviewed

Direct link

Peabody, Michael R. – Applied Measurement in Education, 2020

The purpose of the current article is to introduce the equating and evaluation methods used in this special issue. Although a comprehensive review of all existing models and methodologies would be impractical given the format, a brief introduction to some of the more popular models will be provided. A brief discussion of the conditions required…

Descriptors: Evaluation Methods, Equated Scores, Sample Size, Item Response Theory

Efficient Estimation of Mean Ability Growth Using Vertical Scaling

Peer reviewed

Direct link

Bjermo, Jonas; Miller, Frank – Applied Measurement in Education, 2021

In recent years, the interest in measuring growth in student ability in various subjects between different grades in school has increased. Therefore, good precision in the estimated growth is of importance. This paper aims to compare estimation methods and test designs when it comes to precision and bias of the estimated growth of mean ability…

Descriptors: Scaling, Ability, Computation, Test Items

Effect of Sample Size on Common Item Equating Using the Dichotomous Rasch Model

Peer reviewed

Direct link

O'Neill, Thomas R.; Gregg, Justin L.; Peabody, Michael R. – Applied Measurement in Education, 2020

This study addresses equating issues with varying sample sizes using the Rasch model by examining how sample size affects the stability of item calibrations and person ability estimates. A resampling design was used to create 9 sample size conditions (200, 100, 50, 45, 40, 35, 30, 25, and 20), each replicated 10 times. Items were recalibrated…

Descriptors: Sample Size, Equated Scores, Item Response Theory, Raw Scores

Development and Use of Anchoring Vignettes: Psychometric Investigations and Recommendations for a Nonparametric Approach

Peer reviewed

Direct link

Lee, HyeSun; Smith, Weldon; Martinez, Angel; Ferris, Heather; Bova, Joe – Applied Measurement in Education, 2021

The aim of the current research was to provide recommendations to facilitate the development and use of anchoring vignettes (AVs) for cross-cultural comparisons in education. Study 1 identified six factors leading to order violations and ties in AV responses based on cognitive interviews with 15-year-old students. The factors were categorized into…

Descriptors: Vignettes, Test Items, Equated Scores, Nonparametric Statistics

Investigating the Classification Accuracy of Rasch and Nominal Weights Mean Equating with Very Small Samples

Peer reviewed

Direct link

Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020

Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…

Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores

Subscore Equating and Profile Reporting

Peer reviewed

Direct link

Lim, Euijin; Lee, Won-Chan – Applied Measurement in Education, 2020

The purpose of this study is to address the necessity of subscore equating and to evaluate the performance of various equating methods for subtests. Assuming the random groups design and number-correct scoring, this paper analyzed real data and simulated data with four study factors including test dimensionality, subtest length, form difference in…

Descriptors: Equated Scores, Test Length, Test Format, Difficulty Level

Asymptotic Standard Errors of Equating Coefficients Using the Characteristic Curve Methods for the Graded Response Model

Peer reviewed

Direct link

Zhang, Zhonghua – Applied Measurement in Education, 2020

The characteristic curve methods have been applied to estimate the equating coefficients in test equating under the graded response model (GRM). However, the approaches for obtaining the standard errors for the estimates of these coefficients have not been developed and examined. In this study, the delta method was applied to derive the…

Descriptors: Error of Measurement, Computation, Equated Scores, True Scores

Investigating Repeater Effects on Small Sample Equating: Include or Exclude?

Peer reviewed

Direct link

Diao, Hongyu; Keller, Lisa – Applied Measurement in Education, 2020

Examinees who attempt the same test multiple times are often referred to as "repeaters." Previous studies suggested that repeaters should be excluded from the total sample before equating because repeater groups are distinguishable from non-repeater groups. In addition, repeaters might memorize anchor items, causing item drift under a…

Descriptors: Licensing Examinations (Professions), College Entrance Examinations, Repetition, Testing Problems

Impact of Accumulated Error on Item Response Theory Pre-Equating with Mixed Format Tests

Peer reviewed

Direct link

Keller, Lisa A.; Keller, Robert; Cook, Robert J.; Colvin, Kimberly F. – Applied Measurement in Education, 2016

The equating of tests is an essential process in high-stakes, large-scale testing conducted over multiple forms or administrations. By adjusting for differences in difficulty and placing scores from different administrations of a test on a common scale, equating allows scores from these different forms and administrations to be directly compared…

Descriptors: Item Response Theory, Equated Scores, Test Format, Testing Programs

An Extension of IRT-Based Equating to the Dichotomous Testlet Response Theory Model

Peer reviewed

Direct link

Tao, Wei; Cao, Yi – Applied Measurement in Education, 2016

Current procedures for equating number-correct scores using traditional item response theory (IRT) methods assume local independence. However, when tests are constructed using testlets, one concern is the violation of the local item independence assumption. The testlet response theory (TRT) model is one way to accommodate local item dependence.…

Descriptors: Item Response Theory, Equated Scores, Test Format, Models

An Examination of Two Procedures for Identifying Consequential Item Parameter Drift

Peer reviewed

Direct link

Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu – Applied Measurement in Education, 2014

The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…

Descriptors: Test Items, Test Bias, Equated Scores, Item Response Theory

Impact of Design Effects in Large-Scale District and State Assessments

Peer reviewed

Direct link

Phillips, Gary W. – Applied Measurement in Education, 2015

This article proposes that sampling design effects have potentially huge unrecognized impacts on the results reported by large-scale district and state assessments in the United States. When design effects are unrecognized and unaccounted for they lead to underestimating the sampling error in item and test statistics. Underestimating the sampling…

Descriptors: State Programs, Sampling, Research Design, Error of Measurement

Previous Page | Next Page »

Pages: 1 | 2 | 3

Kolen, Michael J.	3
Lee, Won-Chan	3
Dorans, Neil J.	2
Peabody, Michael R.	2
Alahmadi, Sarah	1
Antal, Judit	1
Ban, Jae-Chun	1
Barry, Carol L.	1
Bjermo, Jonas	1
Bolt, Daniel M.	1
Bova, Joe	1
Cao, Yi	1
Chunyan Liu	1
Colvin, Kimberly F.	1
Cook, Robert J.	1
Crouse, Jill D.	1
Daniel Jurich	1
Diao, Hongyu	1
Dwyer, Andrew C.	1
Eignor, Daniel R.	1
Ferris, Heather	1
Fitzpatrick, Anne R.	1
Forsyth, Robert A.	1
Furter, Robert T.	1
More ▼