Publication Date
In 2025 | 0 |
Since 2024 | 2 |
Since 2021 (last 5 years) | 6 |
Since 2016 (last 10 years) | 9 |
Since 2006 (last 20 years) | 17 |
Descriptor
Item Analysis | 24 |
Test Items | 24 |
Item Response Theory | 8 |
Foreign Countries | 7 |
Mathematics Tests | 7 |
Multiple Choice Tests | 6 |
Test Bias | 6 |
Test Construction | 6 |
Test Format | 6 |
Comparative Analysis | 5 |
Difficulty Level | 5 |
More ▼ |
Source
Applied Measurement in… | 24 |
Author
Publication Type
Journal Articles | 24 |
Reports - Research | 18 |
Reports - Evaluative | 6 |
Education Level
Elementary Secondary Education | 5 |
Secondary Education | 5 |
Grade 7 | 3 |
Higher Education | 3 |
Junior High Schools | 3 |
Middle Schools | 3 |
Postsecondary Education | 3 |
Elementary Education | 2 |
Grade 3 | 2 |
Grade 4 | 2 |
Grade 5 | 2 |
More ▼ |
Audience
Laws, Policies, & Programs
Assessments and Surveys
Program for International… | 2 |
Iowa Tests of Basic Skills | 1 |
TerraNova Multiple Assessments | 1 |
What Works Clearinghouse Rating
Shaojie Wang; Won-Chan Lee; Minqiang Zhang; Lixin Yuan – Applied Measurement in Education, 2024
To reduce the impact of parameter estimation errors on IRT linking results, recent work introduced two information-weighted characteristic curve methods for dichotomous items. These two methods showed outstanding performance in both simulation and pseudo-form pseudo-group analysis. The current study expands upon the concept of information…
Descriptors: Item Response Theory, Test Format, Test Length, Error of Measurement
Traditional vs Intersectional DIF Analysis: Considerations and a Comparison Using State Testing Data
Tony Albano; Brian F. French; Thao Thu Vo – Applied Measurement in Education, 2024
Recent research has demonstrated an intersectional approach to the study of differential item functioning (DIF). This approach expands DIF to account for the interactions between what have traditionally been treated as separate grouping variables. In this paper, we compare traditional and intersectional DIF analyses using data from a state testing…
Descriptors: Test Items, Item Analysis, Data Use, Standardized Tests
Abu-Ghazalah, Rashid M.; Dubins, David N.; Poon, Gregory M. K. – Applied Measurement in Education, 2023
Multiple choice results are inherently probabilistic outcomes, as correct responses reflect a combination of knowledge and guessing, while incorrect responses additionally reflect blunder, a confidently committed mistake. To objectively resolve knowledge from responses in an MC test structure, we evaluated probabilistic models that explicitly…
Descriptors: Guessing (Tests), Multiple Choice Tests, Probability, Models
Ferrara, Steve; Steedle, Jeffrey T.; Frantz, Roger S. – Applied Measurement in Education, 2022
Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13…
Descriptors: Reading Comprehension, Reading Tests, Test Items, Item Response Theory
Pham, Duy N.; Wells, Craig S.; Bauer, Malcolm I.; Wylie, E. Caroline; Monroe, Scott – Applied Measurement in Education, 2021
Assessments built on a theory of learning progressions are promising formative tools to support learning and teaching. The quality and usefulness of those assessments depend, in large part, on the validity of the theory-informed inferences about student learning made from the assessment results. In this study, we introduced an approach to address…
Descriptors: Formative Evaluation, Mathematics Instruction, Mathematics Achievement, Middle School Students
Abulela, Mohammed A. A.; Rios, Joseph A. – Applied Measurement in Education, 2022
When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the…
Descriptors: Comparative Analysis, Robustness (Statistics), Nonparametric Statistics, Item Analysis
Papenberg, Martin; Musch, Jochen – Applied Measurement in Education, 2017
In multiple-choice tests, the quality of distractors may be more important than their number. We therefore examined the joint influence of distractor quality and quantity on test functioning by providing a sample of 5,793 participants with five parallel test sets consisting of items that differed in the number and quality of distractors.…
Descriptors: Multiple Choice Tests, Test Items, Test Validity, Test Reliability
Wyse, Adam E. – Applied Measurement in Education, 2018
This article discusses regression effects that are commonly observed in Angoff ratings where panelists tend to think that hard items are easier than they are and easy items are more difficult than they are in comparison to estimated item difficulties. Analyses of data from two credentialing exams illustrate these regression effects and the…
Descriptors: Regression (Statistics), Test Items, Difficulty Level, Licensing Examinations (Professions)
Gierl, Mark J.; Lai, Hollis; Pugh, Debra; Touchie, Claire; Boulais, André-Philippe; De Champlain, André – Applied Measurement in Education, 2016
Item development is a time- and resource-intensive process. Automatic item generation integrates cognitive modeling with computer technology to systematically generate test items. To date, however, items generated using cognitive modeling procedures have received limited use in operational testing situations. As a result, the psychometric…
Descriptors: Psychometrics, Multiple Choice Tests, Test Items, Item Analysis
Chon, Kyong Hee; Lee, Won-Chan; Ansley, Timothy N. – Applied Measurement in Education, 2013
Empirical information regarding performance of model-fit procedures has been a persistent need in measurement practice. Statistical procedures for evaluating item fit were applied to real test examples that consist of both dichotomously and polytomously scored items. The item fit statistics used in this study included the PARSCALE's G[squared],…
Descriptors: Test Format, Test Items, Item Analysis, Goodness of Fit
Keller, Lisa A.; Keller, Robert R. – Applied Measurement in Education, 2015
Equating test forms is an essential activity in standardized testing, with increased importance with the accountability systems in existence through the mandate of Adequate Yearly Progress. It is through equating that scores from different test forms become comparable, which allows for the tracking of changes in the performance of students from…
Descriptors: Item Response Theory, Rating Scales, Standardized Tests, Scoring Rubrics
Wyse, Adam E.; Albano, Anthony D. – Applied Measurement in Education, 2015
This article used several data sets from a large-scale state testing program to examine the feasibility of combining general and modified assessment items in computerized adaptive testing (CAT) for different groups of students. Results suggested that several of the assumptions made when employing this type of mixed-item CAT may not be met for…
Descriptors: Adaptive Testing, Computer Assisted Testing, Test Items, Testing Programs
Oliveri, Maria E.; Ercikan, Kadriye – Applied Measurement in Education, 2011
In this study, we examine the degree of construct comparability and possible sources of incomparability of the English and French versions of the Programme for International Student Assessment (PISA) 2003 problem-solving measure administered in Canada. Several approaches were used to examine construct comparability at the test- (examination of…
Descriptors: Foreign Countries, English, French, Tests
Taylor, Catherine S.; Lee, Yoonsun – Applied Measurement in Education, 2012
This was a study of differential item functioning (DIF) for grades 4, 7, and 10 reading and mathematics items from state criterion-referenced tests. The tests were composed of multiple-choice and constructed-response items. Gender DIF was investigated using POLYSIBTEST and a Rasch procedure. The Rasch procedure flagged more items for DIF than did…
Descriptors: Test Bias, Gender Differences, Reading Tests, Mathematics Tests
Banks, Kathleen – Applied Measurement in Education, 2012
The purpose of this article is to illustrate a seven-step process for determining whether inferential reading items were more susceptible to cultural bias than literal reading items. The seven-step process was demonstrated using multiple-choice data from the reading portion of a reading/language arts test for fifth and seventh grade Hispanic,…
Descriptors: Reading Tests, Test Items, Standardized Tests, Test Bias
Previous Page | Next Page »
Pages: 1 | 2