ERIC - Search Results

Descriptor

Comparative Analysis	14
Statistical Studies	14
Test Items	14
Simulation	6
Item Response Theory	4
Difficulty Level	3
Factor Analysis	3
Goodness of Fit	3
Item Analysis	3
Mastery Tests	3
Mathematics Tests	3
Research Methodology	3
Scores	3
Statistical Distributions	3
Test Construction	3
Ability	2
Cutting Scores	2
Equated Scores	2
Evaluation Methods	2
Latent Trait Theory	2
Licensing Examinations…	2
Models	2
Raw Scores	2
Statistical Analysis	2
Test Length	2
More ▼

Source

Journal of Educational…	2
Applied Psychological…	1
Educational and Psychological…	1
Journal of Educational…	1

Author

Cohen, Allan S.	1
Ferrara, Steven	1
Hambleton, Ronald K.	1
Hocevar, Dennis	1
Huynh, Huynh	1
Kolen, Michael J.	1
Meisner, Richard	1
Melican, Gerald J.	1
Mills, Craig N.	1
Muraki, Eiji	1
Nandakumar, Ratna	1
Sarvela, Paul D.	1
Schumacker, Randall E.	1
Shepard, Lorrie	1
Smith, Richard M.	1
Wang, Tianyou	1
Wang, Yu-Chung Lawrence	1
Woodruff, David	1
More ▼

Publication Type

Reports - Research	10
Speeches/Meeting Papers	6
Journal Articles	5
Reports - Evaluative	4
Numerical/Quantitative Data	1

Education Level

Audience

Researchers

Location

Japan	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

ACT Assessment

What Works Clearinghouse Rating

Showing all 14 results Save | Export

A Comparison of the Power of Rasch Total and Between-Item Fit Statistics to Detect Measurement Disturbances.

Peer reviewed

Smith, Richard M. – Educational and Psychological Measurement, 1994

Rasch model total-fit statistics and between-item fit statistics were compared for their ability to detect measurement disturbances through the use of simulated data. Results indicate that the between-fit statistic appears more sensitive to systematic measurement disturbances and the total-fit statistic is more sensitive to random measurement…

Descriptors: Comparative Analysis, Goodness of Fit, Item Response Theory, Measurement Techniques

Detection of Differential Item Functioning in the Graded Response Model.

Peer reviewed

Cohen, Allan S.; And Others – Applied Psychological Measurement, 1993

Three measures of differential item functioning for the dichotomous response model are extended to include Samejima's graded response model. Two are based on area differences between item true score functions, and one is a chi-square statistic for comparing differences in item parameters. (SLD)

Descriptors: Chi Square, Comparative Analysis, Identification, Item Bias

The Comparability of the Statistical Characteristics of Test Items Generated by Computer Algorithms.

Download full text

Meisner, Richard; And Others – 1993

This paper presents a study on the generation of mathematics test items using algorithmic methods. The history of this approach is briefly reviewed and is followed by a survey of the research to date on the statistical parallelism of algorithmically generated mathematics items. Results are presented for 8 parallel test forms generated using 16…

Descriptors: Algorithms, Comparative Analysis, Computer Assisted Testing, Item Banks

Assessing Dimensionality of a Set of Item Responses--Comparison of Different Approaches.

Peer reviewed

Nandakumar, Ratna – Journal of Educational Measurement, 1994

Using simulated and real data, this study compares the performance of three methodologies for assessing unidimensionality: (1) DIMTEST; (2) the approach of Holland and Rosenbaum; and (3) nonlinear factor analysis. All three models correctly confirm unidimensionality, but they differ in their ability to detect the lack of unidimensionality.…

Descriptors: Ability, Comparative Analysis, Evaluation Methods, Factor Analysis

Linear Models for Item Scores: Reliability, Covariance Structure, and Psychometric Inference.

Download full text

Woodruff, David – 1993

Two analyses of variance (ANOVA) models for item scores are compared. The first is an items by subject random effect ANOVA. The second is a mixed effects ANOVA with items fixed and subjects random. Comparisons regarding reliability, Cronbach's alpha coefficient, psychometric inference, and inter-item covariance structure are made between the…

Descriptors: Analysis of Covariance, Analysis of Variance, Comparative Analysis, Factor Analysis

Examining Replication Effects in Rasch Fit Statistics.

Download full text

Schumacker, Randall E.; And Others – 1994

Rasch between and total weighted and unweighted fit statistics were compared using varying test lengths and sample sizes. Two test lengths (20 and 50 items) and three sample sizes (150, 500, and 1,000 were crossed. Each of the six combinations were replicated 100 times. In addition, power comparisons were made. Results indicated that there were no…

Descriptors: Comparative Analysis, Goodness of Fit, Item Response Theory, Power (Statistics)

A Comparison of Equal Percentile and Partial Credit Equatings for Performance-Based Assessments Composed of Free-Response Items.

Peer reviewed

Huynh, Huynh; Ferrara, Steven – Journal of Educational Measurement, 1994

Equal percentile (EP) and partial credit (PC) equatings for raw scores from performance-based assessments with free-response items are compared through the use of data from the Maryland School Performance Assessment Program. Results suggest that EP and PC methods do not give equivalent results when distributions are markedly skewed. (SLD)

Descriptors: Comparative Analysis, Equated Scores, Mathematics Tests, Performance Based Assessment

A Quadratic Curve Equating Method To Equate the First Three Moments in Equipercentile Equating.

Download full text

Wang, Tianyou; Kolen, Michael J. – 1994

In this paper a quadratic curve equating method for different test forms under a random-group data-collection design is proposed. Procedures for implementing this method and related issues are described and discussed. The quadratic-curve method was evaluated with real test data (from two 30-item subtests for a professional licensure examination…

Descriptors: Comparative Analysis, Data Collection, Equated Scores, Goodness of Fit

A Preliminary Investigation of Three Compromise Methods for Establishing Cut-Off Scores.

Download full text

Mills, Craig N.; Melican, Gerald J. – 1987

The study compares three methods for establishing cut-off scores that effect a compromise between absolute cut-offs based on item difficulty and relative cut-offs based on expected passing rates. Each method coordinates these two types of information differently. The Beuk method obtains judges' estimates of an absolute cut-off and an expected…

Descriptors: Academic Standards, Certification, Comparative Analysis, Cutting Scores

Implementing Full Information Factor Analysis: TESTFACT Program.

Muraki, Eiji – 1984

The TESTFACT computer program and full-information factor analysis of test items were used in a computer simulation conducted to correct for the guessing effect. Full-information factor analysis also corrects for omitted items. The present version of TESTFACT handles up to five factors and 150 items. A preliminary smoothing of the tetrachoric…

Descriptors: Comparative Analysis, Computer Simulation, Computer Software, Correlation

Accounting for Statistical Artifacts in Item Bias Research.

Peer reviewed

Shepard, Lorrie; And Others – Journal of Educational Statistics, 1984

Item response theory bias detection procedures were applied to data from Black and White seniors on the High School and Beyond data files. Overall, the sums-of-squares statistics (weighted by the inverse of the variance errors) were the best indices for quantifying item characteristic curve differences between groups (Author/BW)

Descriptors: Achievement Tests, Black Students, Comparative Analysis, Evaluation Methods

Optimal Item Selection with Credentialing Examinations.

Download full text

Hambleton, Ronald K.; And Others – 1987

The study compared two promising item response theory (IRT) item-selection methods, optimal and content-optimal, with two non-IRT item selection methods, random and classical, for use in fixed-length certification exams. The four methods were used to construct 20-item exams from a pool of approximately 250 items taken from a 1985 certification…

Descriptors: Comparative Analysis, Content Validity, Cutting Scores, Difficulty Level

Effects of Mathematics Test Content Specificity on Essential Dimensionality in U.S. and Japan Data.

Download full text

Wang, Yu-Chung Lawrence; Hocevar, Dennis – 1994

The major goal of this study is to apply the essential unidimensionality statistic of W. Stout and the corresponding computer program (DIMTEST) to a hierarchical level mathematics achievement data set and to determine the extent to which the undimensional assumption can be accurately applied to mathematics achievement data. The study also…

Descriptors: Ability, Comparative Analysis, Elementary Education, Elementary School Students

Discrimination Indices Commonly Used in Military Training Environments: Effects of Departures from Normal Distributions.

Download full text

Sarvela, Paul D. – 1986

Four discrimination indices were compared, using score distributions which were normal, bimodal, and negatively skewed. The score distributions were systematically varied to represent the common circumstances of a military training situation using criterion-referenced mastery tests. Three 20-item tests were administered to 110 simulated subjects.…

Descriptors: Comparative Analysis, Criterion Referenced Tests, Item Analysis, Mastery Tests