NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 13 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Ting Sun; Stella Yun Kim – Educational and Psychological Measurement, 2024
Equating is a statistical procedure used to adjust for the difference in form difficulty such that scores on those forms can be used and interpreted comparably. In practice, however, equating methods are often implemented without considering the extent to which two forms differ in difficulty. The study aims to examine the effect of the magnitude…
Descriptors: Difficulty Level, Data Interpretation, Equated Scores, High School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Menold, Natalja; Raykov, Tenko – Educational and Psychological Measurement, 2022
The possible dependency of criterion validity on item formulation in a multicomponent measuring instrument is examined. The discussion is concerned with evaluation of the differences in criterion validity between two or more groups (populations/subpopulations) that have been administered instruments with items having differently formulated item…
Descriptors: Test Items, Measures (Individuals), Test Validity, Difficulty Level
Peer reviewed Peer reviewed
Direct linkDirect link
Betts, Joe; Muntean, William; Kim, Doyoung; Kao, Shu-chuan – Educational and Psychological Measurement, 2022
The multiple response structure can underlie several different technology-enhanced item types. With the increased use of computer-based testing, multiple response items are becoming more common. This response type holds the potential for being scored polytomously for partial credit. However, there are several possible methods for computing raw…
Descriptors: Scoring, Test Items, Test Format, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Descriptors: Sample Size, Test Length, Correlation, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Hohensinn, Christine; Kubinger, Klaus D. – Educational and Psychological Measurement, 2011
In aptitude and achievement tests, different response formats are usually used. A fundamental distinction must be made between the class of multiple-choice formats and the constructed response formats. Previous studies have examined the impact of different response formats applying traditional statistical approaches, but these influences can also…
Descriptors: Item Response Theory, Multiple Choice Tests, Responses, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Kubinger, Klaus D. – Educational and Psychological Measurement, 2009
The linear logistic test model (LLTM) breaks down the item parameter of the Rasch model as a linear combination of some hypothesized elementary parameters. Although the original purpose of applying the LLTM was primarily to generate test items with specified item difficulty, there are still many other potential applications, which may be of use…
Descriptors: Models, Test Items, Psychometrics, Item Response Theory
Peer reviewed Peer reviewed
Tollefson, Nona – Educational and Psychological Measurement, 1987
This study compared the item difficulty, item discrimination, and test reliability of three forms of multiple-choice items: (1) one correct answer; (2) "none of the above" as a foil; and (3) "none of the above" as the correct answer. Twelve items in the three formats were administered in a college statistics examination. (BS)
Descriptors: Difficulty Level, Higher Education, Item Analysis, Multiple Choice Tests
Peer reviewed Peer reviewed
Knowles, Susan L.; Welch, Cynthia A. – Educational and Psychological Measurement, 1992
A meta-analysis of the difficulty and discrimination of the "none-of-the-above" (NOTA) test option was conducted with 12 articles (20 effect sizes) for difficulty and 7 studies (11 effect sizes) for discrimination. Findings indicate that using the NOTA option does not result in items of lesser quality. (SLD)
Descriptors: Difficulty Level, Effect Size, Meta Analysis, Multiple Choice Tests
Peer reviewed Peer reviewed
Straton, Ralph G.; Catts, Ralph M. – Educational and Psychological Measurement, 1980
Multiple-choice tests composed entirely of two-, three-, or four-choice items were investigated. Results indicated that number of alternatives per item was inversely related to item difficulty, but directly related to item discrimination. Reliability and standard error of measurement of three-choice item tests was equivalent or superior.…
Descriptors: Difficulty Level, Error of Measurement, Foreign Countries, Higher Education
Peer reviewed Peer reviewed
Cizek, Gregory J. – Educational and Psychological Measurement, 1994
Performance of a common set of test items on an examination in which the order of options for one test form was experimentally manipulated. Results for 759 medical specialty board examinees find that reordering item options results in significant but unpredictable effects on item difficulty. (SLD)
Descriptors: Change, Difficulty Level, Equated Scores, Licensing Examinations (Professions)
Peer reviewed Peer reviewed
Crehan, Kevin D.; And Others – Educational and Psychological Measurement, 1993
Studies with 220 college students found that multiple-choice test items with 3 items are more difficult than those with 4 items, and items with the none-of-these option are more difficult than those without this option. Neither format manipulation affected item discrimination. Implications for test construction are discussed. (SLD)
Descriptors: College Students, Comparative Testing, Difficulty Level, Distractors (Tests)
Peer reviewed Peer reviewed
Styles, Irene; Andrich, David – Educational and Psychological Measurement, 1993
This paper describes the use of the Rasch model to help implement computerized administration of the standard and advanced forms of Raven's Progressive Matrices (RPM), to compare relative item difficulties, and to convert scores between the standard and advanced forms. The sample consisted of 95 girls and 95 boys in Australia. (SLD)
Descriptors: Adaptive Testing, Computer Assisted Testing, Difficulty Level, Elementary Education
Peer reviewed Peer reviewed
Aiken, Lewis R. – Educational and Psychological Measurement, 1989
Two alternatives to traditional item analysis and reliability estimation procedures are considered for determining the difficulty, discrimination, and reliability of optional items on essay and other tests. A computer program to compute these measures is described, and illustrations are given. (SLD)
Descriptors: College Entrance Examinations, Computer Software, Difficulty Level, Essay Tests