ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	0
Since 2007 (last 20 years)	2

Descriptor

Sample Size	6
Test Items	6
Achievement Tests	2
Criterion Referenced Tests	2
Difficulty Level	2
Equated Scores	2
Estimation (Mathematics)	2
Item Bias	2
Item Response Theory	2
Sampling	2
Tables (Data)	2
Test Construction	2
Ability	1
Academic Achievement	1
Accuracy	1
Change	1
Comparative Analysis	1
Cutting Scores	1
Educational Problems	1
Elementary Secondary Education	1
Error of Measurement	1
Foreign Countries	1
Goodness of Fit	1
Graphs	1
Guides	1
More ▼

Source

ETS Research Report Series	1
Pearson	1

Author

Davenport, Ernest C., Jr.	1
Davison, Mark L.	1
Farish, Stephen J.	1
Goodman, Joshua	1
Kim, Seock-Ho	1
Kim, Sooyeon	1
Kwak, Nohoon	1
Livingston, Samuel A.	1
Meyers, Jason L.	1
Millman, Jason	1
Murphy, Stephen	1
Turhan, Ahmet	1
More ▼

Publication Type

Numerical/Quantitative Data	6
Reports - Research	3
Speeches/Meeting Papers	3
Reports - Evaluative	2
Guides - Non-Classroom	1
Journal Articles	1

Education Level

Elementary Secondary Education

Audience

Researchers

Location

Australia

Laws, Policies, & Programs

Assessments and Surveys

What Works Clearinghouse Rating

Showing all 6 results Save | Export

An Empirical Comparison of Methods for Equating with Randomly Equivalent Groups of 50 to 400 Test Takers. Research Report. ETS RR-10-05

Peer reviewed
PDF on ERIC

Download full text

Livingston, Samuel A.; Kim, Sooyeon – ETS Research Report Series, 2010

A series of resampling studies investigated the accuracy of equating by four different methods in a random groups equating design with samples of 400, 200, 100, and 50 test takers taking each form. Six pairs of forms were constructed. Each pair was constructed by assigning items from an existing test taken by 9,000 or more test takers. The…

Descriptors: Equated Scores, Accuracy, Sample Size, Sampling

The Impact of Item Position Change on Item Parameters and Common Equating Results under the 3PL Model

Direct link

Meyers, Jason L.; Murphy, Stephen; Goodman, Joshua; Turhan, Ahmet – Pearson, 2012

Operational testing programs employing item response theory (IRT) applications benefit from of the property of item parameter invariance whereby item parameter estimates obtained from one sample can be applied to other samples (when the underlying assumptions are satisfied). In theory, this feature allows for applications such as computer-adaptive…

Descriptors: Equated Scores, Test Items, Test Format, Item Response Theory

An Investigation of the Likelihood Ratio Test, the Mantel Test, and the Generalized Mantel-Haenszel Test of DIF.

Download full text

Kim, Seock-Ho – 2000

This paper is concerned with statistical issues in differential item functioning (DIF). Four subsets of large scale performance assessment data from the Georgia Kindergarten Assessment Program-Revised (N=105,731; N=10,000; N=1,00; and N=100) were analyzed using three DIF detection methods for polytomous items to examine the congruence among the…

Descriptors: Item Bias, Item Response Theory, Kindergarten, Performance Based Assessment

A Comparative Study of Observed Score Approaches and Purification Procedures for Detecting Differential Item Functioning.

Download full text

Kwak, Nohoon; Davenport, Ernest C., Jr.; Davison, Mark L. – 1998

The purposes of this study were to introduce the iterative purification procedure and to compare this with the two-step purification procedure, to compare false positive error rates and the power of five observed score approaches and to identify factors affecting power and false positive rates in each method. This study used 2,400 data sets that…

Descriptors: Ability, Comparative Analysis, Error of Measurement, Estimation (Mathematics)

Determining Test Length. Passing Scores and Test Lengths for Objective-Based Tests.

Millman, Jason – 1972

Two aspects of criterion referenced testing are discussed: cutting scores and test length. Several practices in determining passing scores are enumerated: (1) setting passing scores so that a predetermined percent of students pass; (2) inspecting each test item to determine how important it is that it be answered correctly; (3) determining the…

Descriptors: Achievement Tests, Criterion Referenced Tests, Cutting Scores, Educational Problems

Investigating Item Stability: An Empirical Investigation into the Variability of Item Statistics Under Conditions of Varying Sample Design and Sample Size. Occasional Paper No. 18.

Download full text

Farish, Stephen J. – 1984

The stability of Rasch test item difficulty parameters was investigated under varying conditions. Data were taken from a mathematics achievement test administered to over 2,000 Australian students. The experiments included: (1) relative stability of the Rasch, traditional, and z-item difficulty parameters using different sample sizes and designs;…

Descriptors: Achievement Tests, Difficulty Level, Estimation (Mathematics), Foreign Countries