NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Dorans, Neil J.; Liu, Jinghua; Hammond, Shelby – Applied Psychological Measurement, 2008
This exploratory study was built on research spanning three decades. Petersen, Marco, and Stewart (1982) conducted a major empirical investigation of the efficacy of different equating methods. The studies reported in Dorans (1990) examined how different equating methods performed across samples selected in different ways. Recent population…
Descriptors: Test Format, Equated Scores, Sampling, Evaluation Methods
Peer reviewed Peer reviewed
Direct linkDirect link
von Davier, Alina A.; Wilson, Christine – Applied Psychological Measurement, 2008
Dorans and Holland (2000) and von Davier, Holland, and Thayer (2003) introduced measures of the degree to which an observed-score equating function is sensitive to the population on which it is computed. This article extends the findings of Dorans and Holland and of von Davier et al. to item response theory (IRT) true-score equating methods that…
Descriptors: Advanced Placement, Advanced Placement Programs, Equated Scores, Calculus
Peer reviewed Peer reviewed
Wang, Tianyou; Kolen, Michael J. – Applied Psychological Measurement, 1996
A quadratic curve test equating method for equating different test forms under a random-groups data collection design is proposed that equates the first three central moments of the test forms. When applied to real test data, the method performs as well as other equating methods. Procedures from implementing the test are described. (SLD)
Descriptors: Data Collection, Equated Scores, Standardized Tests, Test Construction
Peer reviewed Peer reviewed
Armstrong, Ronald D.; Jones, Douglas H.; Kunce, Charles S. – Applied Psychological Measurement, 1998
Investigated the use of mathematical programming techniques to generate parallel test forms with passages and items based on item-response theory (IRT) using the Fundamentals of Engineering Examination. Generated four parallel test forms from the item bank of almost 1,100 items. Comparison with human-generated forms supports the mathematical…
Descriptors: Engineering, Item Banks, Item Response Theory, Test Construction
Peer reviewed Peer reviewed
Baker, Frank B. – Applied Psychological Measurement, 1996
Using the characteristic curve method for dichotomously scored test items, the sampling distributions of equating coefficients were examined. Simulations indicate that for the equating conditions studied, the sampling distributions of the equating coefficients appear to have acceptable characteristics, suggesting confidence in the values obtained…
Descriptors: Equated Scores, Item Response Theory, Sampling, Statistical Distributions
Peer reviewed Peer reviewed
Hanson, Bradley A.; And Others – Applied Psychological Measurement, 1993
The delta method was used to derive standard errors (SES) of the Levine observed score and Levine true score linear test equating methods using data from two test forms. SES derived without the normality assumption and bootstrap SES were very close. The situation with skewed score distributions is also discussed. (SLD)
Descriptors: Equated Scores, Equations (Mathematics), Error of Measurement, Sampling
Peer reviewed Peer reviewed
Berger, Martijn P. F. – Applied Psychological Measurement, 1994
This paper focuses on similarities of optimal design of fixed-form tests, adaptive tests, and testlets within the framework of the general theory of optimal designs. A sequential design procedure is proposed that uses these similarities to obtain consistent estimates for the trait level distribution. (SLD)
Descriptors: Achievement Tests, Adaptive Testing, Algorithms, Estimation (Mathematics)
Peer reviewed Peer reviewed
Wilson, Mark; Wang, Wen-chung – Applied Psychological Measurement, 1995
Data from the California Learning Assessment System mathematics assessment were used to examine issues that arise when scores from different assessment modes are combined. Multiple-choice, open-ended, and investigation items were combined in a test across three test forms. Results illustrate the difficulties faced in evaluating combined…
Descriptors: Educational Assessment, Equated Scores, Evaluation Methods, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Yao, Lihua; Schwarz, Richard D. – Applied Psychological Measurement, 2006
Multidimensional item response theory (IRT) models have been proposed for better understanding the dimensional structure of data or to define diagnostic profiles of student learning. A compensatory multidimensional two-parameter partial credit model (M-2PPC) for constructed-response items is presented that is a generalization of those proposed to…
Descriptors: Models, Item Response Theory, Markov Processes, Monte Carlo Methods
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Applied Psychological Measurement, 2006
Two local methods for observed-score equating are applied to the problem of equating an adaptive test to a linear test. In an empirical study, the methods were evaluated against a method based on the test characteristic function (TCF) of the linear test and traditional equipercentile equating applied to the ability estimates on the adaptive test…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Equated Scores
Peer reviewed Peer reviewed
Woodruff, David J.; Sawyer, Richard L. – Applied Psychological Measurement, 1989
Two methods--non-distributional and normal--are derived for estimating measures of pass-fail reliability. Both are based on the Spearman Brown formula and require only a single test administration. Results from a simulation (n=20,000 examinees) and a licensure examination (n=4,828 examinees) illustrate these methods. (SLD)
Descriptors: Equations (Mathematics), Estimation (Mathematics), Licensing Examinations (Professions), Measures (Individuals)
Peer reviewed Peer reviewed
Norcini, John; And Others – Applied Psychological Measurement, 1991
Effects of numbers of experts (NOEs) and common items (CIs) on the scaling of cutting scores from expert judgments were studied for 11,917 physicians taking 2 forms of a medical specialty examination. Increasing NOEs and CIs reduced error; beyond 5 experts and 25 CIs, error differences were small. (SLD)
Descriptors: Comparative Testing, Cutting Scores, Equated Scores, Estimation (Mathematics)
Peer reviewed Peer reviewed
Birenbaum, Menucha; And Others – Applied Psychological Measurement, 1992
The effect of multiple-choice (MC) or open-ended (OE) response format on diagnostic assessment of algebra test performance was investigated with 231 eighth and ninth graders in Tel Aviv (Israel) using bug or rule space analysis. Both analyses indicated closer similarity between parallel OE subsets than between stem-equivalent OE and MC subsets.…
Descriptors: Algebra, Comparative Testing, Educational Assessment, Educational Diagnosis
Peer reviewed Peer reviewed
Budescu, David V. – Applied Psychological Measurement, 1988
A multiple matching test--a 24-item Hebrew vocabulary test--was examined, in which distractors from several items are pooled into one list at the test's end. Construction of such tests was feasible. Reliability, validity, and reduction of random guessing were satisfactory when applied to data from 717 applicants to Israeli universities. (SLD)
Descriptors: College Applicants, Feasibility Studies, Foreign Countries, Guessing (Tests)