ERIC Number: ED605430
Record Type: Non-Journal
Publication Date: 2018
Pages: 28
Abstractor: As Provided
ISBN: N/A
ISSN: ISSN-
EISSN: N/A
Available Date: N/A
Evaluating the 'Similar Items Method' for Standard Maintaining. Conference Paper
Bramley, Tom
Cambridge Assessment, Paper presented at the Annual Conference of the Association for Educational Assessment in Europe (19th, Arnhem-Nijmegen, The Netherlands, Nov 2018)
The aim of the research reported here was to get some idea of the accuracy of grade boundaries (cut-scores) obtained by applying the 'similar items method' described in Bramley & Wilson (2016). In this method experts identify items on the current version of a test that are sufficiently similar to items on previous versions for them to be treated as pseudo-anchor items. It could be useful in any international testing context using similar item types and under similar test development constraints (no pre-testing and no item re-use) to GCSEs and A levels in England. Study 1 aimed to discover: i) the extent to which the equated grade boundary depends on "which" items are identified as similar; and ii) the extent to which it depends on "how many" items are identified as similar. Study 2 attempted a direct comparison with established methods for equating tests taken by non-equivalent groups. This was achieved by constructing a scenario in which "all" the similar items came from the "same" previous version (which is not the case in the intended application of the method). In this scenario the method can be directly compared with methods where common items form an internal anchor test. Study 1 found that in the ideal case where the 'similar' items were in fact identical, roughly 20% of items or marks were enough to give a cut-score that was within 1 score point of the average (across different combinations of a fixed number of similar items). As expected, the fewer similar items, the greater the variability. There was a small amount of bias in the method -- some inherently arising from using integer cut-scores on different versions of the test, some arising out of the equating method used to define equivalent cut-scores. Study 2 found that the when the similar items method was applied to the scenario where all items came from the same previous test (a standard common-item equating scenario), it gave very similar outcomes to IRT true-score equating. However, when applied in the 'one item at a time' way intended for real scenarios where similar items might come from different tests it was vulnerable to distortions created by outlying items. This problem can be diagnosed by inspecting empirical item characteristic curves and equating functions implied by individual items. Increasing the smoothing of the empirical item characteristic curves improved the accuracy of the equating from the similar items method.
Descriptors: Accuracy, Cutting Scores, Test Items, Item Analysis, International Assessment, Test Construction, Grades (Scholastic), Equated Scores, Comparative Analysis, Scores, Item Response Theory, Foreign Countries, Achievement Tests, Secondary School Students, Standards
University of Cambridge Local Examinations Syndicate (Cambridge Assessment). The Triangle Building, Shaftesbury Road, Cambridge, CB2 8EA, UK. Tel: +44-1223-55331; Fax: +44-1223-460278; e-mail: info@cambridgeassessment.org.uk; Web site: https://www.cambridgeassessment.org.uk/
Publication Type: Speeches/Meeting Papers; Reports - Research; Numerical/Quantitative Data
Education Level: Secondary Education
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Cambridge Assessment (United Kingdom)
Identifiers - Location: United Kingdom (England)
Grant or Contract Numbers: N/A
Author Affiliations: N/A