NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 1 to 15 of 19 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Jones, Paul; Tong, Ye; Liu, Jinghua; Borglum, Joshua; Primoli, Vince – Journal of Educational Measurement, 2022
This article studied two methods to detect mode effects in two credentialing exams. In Study 1, we used a "modal scale comparison approach," where the same pool of items was calibrated separately, without transformation, within two TC cohorts (TC1 and TC2) and one OP cohort (OP1) matched on their pool-based scale score distributions. The…
Descriptors: Scores, Credentials, Licensing Examinations (Professions), Computer Assisted Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Baldwin, Peter; Clauser, Brian E. – Journal of Educational Measurement, 2022
While score comparability across test forms typically relies on common (or randomly equivalent) examinees or items, innovations in item formats, test delivery, and efforts to extend the range of score interpretation may require a special data collection before examinees or items can be used in this way--or may be incompatible with common examinee…
Descriptors: Scoring, Testing, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Li, Jie; van der Linden, Wim J. – Journal of Educational Measurement, 2018
The final step of the typical process of developing educational and psychological tests is to place the selected test items in a formatted form. The step involves the grouping and ordering of the items to meet a variety of formatting constraints. As this activity tends to be time-intensive, the use of mixed-integer programming (MIP) has been…
Descriptors: Programming, Automation, Test Items, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Shuchang; Cai, Yan; Tu, Dongbo – Journal of Educational Measurement, 2018
This study applied the mode of on-the-fly assembled multistage adaptive testing to cognitive diagnosis (CD-OMST). Several and several module assembly methods for CD-OMST were proposed and compared in terms of measurement precision, test security, and constrain management. The module assembly methods in the study included the maximum priority index…
Descriptors: Adaptive Testing, Monte Carlo Methods, Computer Security, Clinical Diagnosis
Peer reviewed Peer reviewed
Direct linkDirect link
Debeer, Dries; Ali, Usama S.; van Rijn, Peter W. – Journal of Educational Measurement, 2017
Test assembly is the process of selecting items from an item pool to form one or more new test forms. Often new test forms are constructed to be parallel with an existing (or an ideal) test. Within the context of item response theory, the test information function (TIF) or the test characteristic curve (TCC) are commonly used as statistical…
Descriptors: Test Format, Test Construction, Statistical Analysis, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Chunyan; Kolen, Michael J. – Journal of Educational Measurement, 2018
Smoothing techniques are designed to improve the accuracy of equating functions. The main purpose of this study is to compare seven model selection strategies for choosing the smoothing parameter (C) for polynomial loglinear presmoothing and one procedure for model selection in cubic spline postsmoothing for mixed-format pseudo tests under the…
Descriptors: Comparative Analysis, Accuracy, Models, Sample Size
Peer reviewed Peer reviewed
Direct linkDirect link
van der Ark, L. Andries; Emons, Wilco H. M.; Sijtsma, Klaas – Journal of Educational Measurement, 2008
Two types of answer-copying statistics for detecting copiers in small-scale examinations are proposed. One statistic identifies the "copier-source" pair, and the other in addition suggests who is copier and who is source. Both types of statistics can be used when the examination has alternate test forms. A simulation study shows that the…
Descriptors: Cheating, Statistics, Test Format, Measures (Individuals)
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…
Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis
Peer reviewed Peer reviewed
Mills, Craig N. – Journal of Educational Measurement, 1983
This study compares the results obtained using the Angoff, borderline group, and contrasting groups methods of determining performance standards. Congruent results were obtained from the Angoff and contrasting groups methods for several test forms. Borderline group standards were not similar to standards obtained with other methods. (Author/PN)
Descriptors: Comparative Analysis, Criterion Referenced Tests, Cutting Scores, Standard Setting (Scoring)
Peer reviewed Peer reviewed
Wainer, Howard; And Others – Journal of Educational Measurement, 1994
The comparability of scores on test forms that are constructed through examinee item choice is examined in an item response theory framework. The approach is illustrated with data from the College Board's Advanced Placement Test in Chemistry taken by over 18,000 examinees. (SLD)
Descriptors: Advanced Placement, Chemistry, Comparative Analysis, Constructed Response
Peer reviewed Peer reviewed
Jaeger, Richard M.; Wolf, Marian B. – Journal of Educational Measurement, 1982
The effectiveness of a Likert-scale and three paired-choice presentation formats in discriminating among parents' preferences for curriculum elements were compared. Paired-choice formats gave more reliable discriminations which increased with stimulus specificity. Similarities and differences in preference orderings are discussed. (Author/CM)
Descriptors: Comparative Analysis, Elementary Education, Parent Attitudes, Parent School Relationship
Peer reviewed Peer reviewed
Williams, Valerie S. L.; Pommerich, Mary; Thissen, David – Journal of Educational Measurement, 1998
Created a developmental scale for the North Carolina End-of-Grade Mathematics Tests using a subset of identical test forms administered to adjacent grade levels with Thurstone scaling and Item Response Theory methods. Discusses differences in patterns produced. (Author/SLD)
Descriptors: Achievement Tests, Child Development, Comparative Analysis, Elementary Secondary Education
Peer reviewed Peer reviewed
Frary, Robert B. – Journal of Educational Measurement, 1985
Responses to a sample test were simulated for examinees under free-response and multiple-choice formats. Test score sets were correlated with randomly generated sets of unit-normal measures. The extent of superiority of free response tests was sufficiently small so that other considerations might justifiably dictate format choice. (Author/DWH)
Descriptors: Comparative Analysis, Computer Simulation, Essay Tests, Guessing (Tests)
Previous Page | Next Page ยป
Pages: 1  |  2