Evaluating 1-, 2- and 3- Parameter Logistic Models Using Model-Based and Empirically-Based Simulations under Homogeneous and Heterogeneous Set Conditions.

Rizavi, Saba; Way, Walter D.; Lu, Ying; Pitoniak, Mary; Steffen, Manfred

Notes FAQ Contact Us

Back to results

Download full text

ERIC Number: ED490464

Record Type: Non-Journal

Publication Date: 2004-Jan

Pages: 35

Abstractor: Author

ISBN: N/A

ISSN: N/A

EISSN: N/A

Available Date: N/A

Evaluating 1-, 2- and 3- Parameter Logistic Models Using Model-Based and Empirically-Based Simulations under Homogeneous and Heterogeneous Set Conditions

Rizavi, Saba; Way, Walter D.; Lu, Ying; Pitoniak, Mary; Steffen, Manfred

Online Submission, Paper presented at the Annual Meeting of National Council on Measurement in Education (Chicago, IL, Apr 10, 2003)

The purpose of this study was to use realistically simulated data to evaluate various CAT designs for use with the verbal reasoning measure of the Medical College Admissions Test (MCAT). Factors such as item pool depth, content constraints, and item formats often cause repeated adaptive administrations of an item at ability levels that are not matched to its difficulty, in which case the model-data misfit, if existing, might contribute to bias in examinees? final ability estimates since the data generated from the model might not represent the real response patterns of examinees. By incorporating the model-data misfit into adaptive testing simulations, this study introduced a simulation methodology in a real-world situation to address this important issue. The CAT simulations that were carried out suggested that measurement precision equivalent to the current paper-and-pencil MCAT Verbal Reasoning test could be achieved with a 32-item adaptive test based on the 2-PL or 3-PL models. Although the 2-PL and 3-PL simulations made slightly less uniform use of the item pools, the differences between these models and the 1-PL model were surprisingly small. The results showed that when there was considerable amount of model data misfit, the model-based simulations gave smaller CSEMs at certain ability levels, which are misleading. The empirically-based simulations provided a more reliable way of evaluating a CAT design before it's implementation. An administration of a 2-PL CAT with reliability comparable to P&P reliability using almost half the P&P test length is a very positive finding of the study. The use of a 2-PL instead of a 3-PL model is recommended because of the simplicity of the 2-PL model. (Contains 37 tables.) [This paper was also presented at the Graduate Student Research Panel of American Association of Medical Colleges (Washington, DC, October 11, 2002). Research finalized in 2004.]

Descriptors: Test Items, Test Bias, Item Banks, College Admission, Adaptive Testing, Computer Simulation, Medical Schools, Computer Assisted Testing, Evaluation Methods, Measurement Techniques, Goodness of Fit, Models, Verbal Tests, Thinking Skills, Ability Identification

Publication Type: Numerical/Quantitative Data; Reports - Research; Speeches/Meeting Papers

Education Level: Higher Education

Audience: N/A

Language: English

Sponsor: Association of American Medical Colleges, Washington, DC.; Educational Testing Service, Princeton, NJ.

Authoring Institution: N/A

Identifiers - Assessments and Surveys: Medical College Admission Test; California Achievement Tests

Grant or Contract Numbers: N/A

Author Affiliations: N/A