NotesFAQContact Us
Collection
Advanced
Search Tips
Source
Applied Measurement in…54
Audience
Laws, Policies, & Programs
What Works Clearinghouse Rating
Showing 16 to 30 of 54 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N. – Applied Measurement in Education, 2013
Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Descriptors: Test Format, Test Items, Item Analysis, Goodness of Fit
Peer reviewed Peer reviewed
Direct linkDirect link
Edwards, Michael C.; Flora, David B.; Thissen, David – Applied Measurement in Education, 2012
This article describes a computerized adaptive test (CAT) based on the uniform item exposure multi-form structure (uMFS). The uMFS is a specialization of the multi-form structure (MFS) idea described by Armstrong, Jones, Berliner, and Pashley (1998). In an MFS CAT, the examinee first responds to a small fixed block of items. The items comprising…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Format, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015
Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…
Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Sooyeon; Walker, Michael – Applied Measurement in Education, 2012
This study examined the appropriateness of the anchor composition in a mixed-format test, which includes both multiple-choice (MC) and constructed-response (CR) items, using subpopulation invariance indices. Linking functions were derived in the nonequivalent groups with anchor test (NEAT) design using two types of anchor sets: (a) MC only and (b)…
Descriptors: Multiple Choice Tests, Test Format, Test Items, Equated Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2012
This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did…
Descriptors: Test Bias, Gender Differences, Reading Tests, Mathematics Tests
Peer reviewed Peer reviewed
Direct linkDirect link
Wan, Lei; Henly, George A. – Applied Measurement in Education, 2012
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats--the figural response (FR) and constructed response (CR) formats used in a K-12 computerized…
Descriptors: Test Items, Test Format, Computer Assisted Testing, Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Liu, Ou Lydia; Wilson, Mark – Applied Measurement in Education, 2009
Many efforts have been made to determine and explain differential gender performance on large-scale mathematics assessments. A well-agreed-on conclusion is that gender differences are contextualized and vary across math domains. This study investigated the pattern of gender differences by item domain (e.g., Space and Shape, Quantity) and item type…
Descriptors: Gender Differences, Mathematics Tests, Measurement, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Cho, Hyun-Jeong; Lee, Jaehoon; Kingston, Neal – Applied Measurement in Education, 2012
This study examined the validity of test accommodation in third-eighth graders using differential item functioning (DIF) and mixture IRT models. Two data sets were used for these analyses. With the first data set (N = 51,591) we examined whether item type (i.e., story, explanation, straightforward) or item features were associated with item…
Descriptors: Testing Accommodations, Test Bias, Item Response Theory, Validity
Peer reviewed Peer reviewed
Direct linkDirect link
Keng, Leslie; McClarty, Katie Larsen; Davis, Laurie Laughlin – Applied Measurement in Education, 2008
This article describes a comparative study conducted at the item level for paper and online administrations of a statewide high stakes assessment. The goal was to identify characteristics of items that may have contributed to mode effects. Item-level analyses compared two modes of the Texas Assessment of Knowledge and Skills (TAKS) for up to four…
Descriptors: Computer Assisted Testing, Geometric Concepts, Grade 8, Comparative Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Sun, Koun-Tem; Chen, Yu-Jen; Tsai, Shu-Yen; Cheng, Chien-Fen – Applied Measurement in Education, 2008
In educational measurement, the construction of parallel test forms is often a combinatorial optimization problem that involves the time-consuming selection of items to construct tests having approximately the same test information functions (TIFs) and constraints. This article proposes a novel method, genetic algorithm (GA), to construct parallel…
Descriptors: Test Format, Measurement Techniques, Equations (Mathematics), Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Ascalon, M. Evelina; Meyers, Lawrence S.; Davis, Bruce W.; Smits, Niels – Applied Measurement in Education, 2007
This article examined two item-writing guidelines: the format of the item stem and homogeneity of the answer set. Answering the call of Haladyna, Downing, and Rodriguez (2002) for empirical tests of item writing guidelines and extending the work of Smith and Smith (1988) on differential use of item characteristics, a mock multiple-choice driver's…
Descriptors: Guidelines, Difficulty Level, Standard Setting, Driver Education
Peer reviewed Peer reviewed
Hanson, Bradley A. – Applied Measurement in Education, 1996
Determining whether score distributions differ on two or more test forms administered to samples of examinees from a single population is explored using three statistical tests using loglinear models. Examples are presented of applying tests of distribution differences to decide if equating is needed for alternative forms of a test. (SLD)
Descriptors: Equated Scores, Scoring, Statistical Distributions, Test Format
Peer reviewed Peer reviewed
Feldt, Leonard S. – Applied Measurement in Education, 2002
Considers the degree of bias in testlet-based alpha (internal consistency reliability) through hypothetical examples and real test data from four tests of the Iowa Tests of Basic Skills. Presents a simple formula for computing a testlet-based congeneric coefficient. (SLD)
Descriptors: Estimation (Mathematics), Reliability, Statistical Bias, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Kim, Seonghoon; Kolen, Michael J. – Applied Measurement in Education, 2006
Four item response theory linking methods (2 moment methods and 2 characteristic curve methods) were compared to concurrent (CO) calibration with the focus on the degree of robustness to format effects (FEs) when applying the methods to multidimensional data that reflected the FEs associated with mixed-format tests. Based on the quantification of…
Descriptors: Item Response Theory, Robustness (Statistics), Test Format, Comparative Analysis
Pages: 1  |  2  |  3  |  4