NotesFAQContact Us
Collection
Advanced
Search Tips
Publication Date
In 20250
Since 20240
Since 2021 (last 5 years)0
Since 2016 (last 10 years)1
Since 2006 (last 20 years)4
Audience
Researchers2
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
Showing 1 to 15 of 25 results Save | Export
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Dogan, Nuri; Hambleton, Ronald K.; Yurtcu, Meltem; Yavuz, Sinan – Cypriot Journal of Educational Sciences, 2018
Validity is one of the psychometric properties of the achievement tests. To determine the validity, one of the examination is item bias studies, which are based on differential item functioning (DIF) analyses and field experts' opinion. In this study, field experts were asked to estimate the DIF levels of the items to compare the estimations…
Descriptors: Test Bias, Comparative Analysis, Predictor Variables, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Wells, Craig S.; Hambleton, Ronald K.; Kirkpatrick, Robert; Meng, Yu – Applied Measurement in Education, 2014
The purpose of the present study was to develop and evaluate two procedures flagging consequential item parameter drift (IPD) in an operational testing program. The first procedure was based on flagging items that exhibit a meaningful magnitude of IPD using a critical value that was defined to represent barely tolerable IPD. The second procedure…
Descriptors: Test Items, Test Bias, Equated Scores, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Wells, Craig S.; Baldwin, Su; Hambleton, Ronald K.; Sireci, Stephen G.; Karatonis, Ana; Jirka, Stephen – Applied Measurement in Education, 2009
Score equity assessment is an important analysis to ensure inferences drawn from test scores are comparable across subgroups of examinees. The purpose of the present evaluation was to assess the extent to which the Grade 8 NAEP Math and Reading assessments for 2005 were equivalent across selected states. More specifically, the present study…
Descriptors: National Competency Tests, Test Bias, Equated Scores, Grade 8
Peer reviewed Peer reviewed
Rogers, H. Jane; Hambleton, Ronald K. – Educational and Psychological Measurement, 1989
The validity of logistic test models and computer simulation methods for generating sampling distributions of item bias statistics was evaluated under the hypothesis of no item bias. Test data from 937 ninth-grade students were used to develop 7 steps for applying computer-simulated baseline statistics in test development. (SLD)
Descriptors: Computer Simulation, Educational Research, Evaluation Methods, Grade 9
Peer reviewed Peer reviewed
Direct linkDirect link
Monahan, Patrick O.; Stump, Timothy E.; Finch, Holmes; Hambleton, Ronald K. – Applied Psychological Measurement, 2007
DETECT is a nonparametric "full" dimensionality assessment procedure that clusters dichotomously scored items into dimensions and provides a DETECT index of magnitude of multidimensionality. Four factors (test length, sample size, item response theory [IRT] model, and DETECT index) were manipulated in a Monte Carlo study of bias, standard error,…
Descriptors: Test Length, Sample Size, Monte Carlo Methods, Geometric Concepts
Hambleton, Ronald K.; Rogers, H. Jane – 1988
The agreement between item response theory-based and Mantel Haenszel (MH) methods in identifying biased items on tests was studied. Data came from item responses of four spaced samples of 1,000 examinees each--two samples of 1,000 Anglo-American and two samples of 1,000 Native American students taking the New Mexico High School Proficiency…
Descriptors: Comparative Analysis, High School Students, High Schools, Item Analysis
Rogers, H. Jane; Hambleton, Ronald K. – 1987
Although item bias statistics are widely recommended for use in test development and test analysis work, problems arise in their interpretation. The purpose of the present research was to evaluate the validity of logistic test models and computer simulation methods for providing a frame of reference for item bias statistic interpretations.…
Descriptors: Computer Simulation, Evaluation Methods, Item Analysis, Latent Trait Theory
Peer reviewed Peer reviewed
Mazor, Kathleen M.; Hambleton, Ronald K.; Clauser, Brian E. – Applied Psychological Measurement, 1998
Studied whether matching on multiple test scores would reduce false-positive error rates compared to matching on a single number-correct score using simulation. False-positive error rates were reduced for most datasets. Findings suggest that assessing the dimensional structure of a test can be important in analysis of differential item functioning…
Descriptors: Error of Measurement, Item Bias, Scores, Test Items
Peer reviewed Peer reviewed
Zenisky, April L.; Hambleton, Ronald K.; Robin, Frederic – Educational and Psychological Measurement, 2003
Studied a two-stage methodology for evaluating differential item functioning (DIF) in large-scale assessment data using a sample of 60,000 students taking a large-scale assessment. Findings illustrate the merit of iterative approached for DIF detection, since items identified at one stage were not necessarily the same as those identified at the…
Descriptors: Item Bias, Large Scale Assessment, Research Methodology, Test Items
Rogers, H. Jane; Hambleton, Ronald K. – 1987
Though item bias statistics are widely recommended for use in test development and analysis, problems arise in their interpretation. This research evaluates logistic test models and computer simulation methods for providing a frame of reference for interpreting item bias statistics. Specifically, the intent was to produce simulated sampling…
Descriptors: Computer Simulation, Cutting Scores, Grade 9, Latent Trait Theory
Peer reviewed Peer reviewed
Robin, Frederic; Sireci, Stephen G.; Hambleton, Ronald K. – International Journal of Testing, 2003
Illustrates how multidimensional scaling (MDS) and differential item functioning (DIF) procedures can be used to evaluate the equivalence of different language versions of an examination. Presents examples of structural differences and DIF across languages. (SLD)
Descriptors: Item Bias, Licensing Examinations (Professions), Multidimensional Scaling, Multilingual Materials
Peer reviewed Peer reviewed
Direct linkDirect link
Zenisky, April L.; Hambleton, Ronald K.; Robin, Frederic – Educational Assessment, 2004
Differential item functioning (DIF) analyses are a routine part of the development of large-scale assessments. Less common are studies to understand the potential sources of DIF. The goals of this study were (a) to identify gender DIF in a large-scale science assessment and (b) to look for trends in the DIF and non-DIF items due to content,…
Descriptors: Program Effectiveness, Test Format, Science Tests, Test Items
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Rogers, H. Jane – Applied Measurement in Education, 1989
Item Response Theory and Mantel-Haenszel approaches for investigating differential item performance were compared to assess the level of agreement of the approaches in identifying potentially biased items. Subjects were 2,000 White and 2,000 Native American high school students. The Mantel-Haenszel method provides an acceptable approximation of…
Descriptors: American Indians, Comparative Testing, High School Students, High Schools
Peer reviewed Peer reviewed
Hambleton, Ronald K.; Jones, Russell W. – Educational Research Quarterly, 1994
A judgmental method for determining item bias was applied to test data from 2,000 Native American and 2,000 Anglo-American students for a statewide proficiency test. Results indicated some shortcomings of the judgmental method but supported the use of cross-validation in empirically identifying potential bias. (SLD)
Descriptors: American Indians, Anglo Americans, Comparative Analysis, Decision Making
Peer reviewed Peer reviewed
Allalouf, Avi; Hambleton, Ronald K.; Sireci, Stephen G. – Journal of Educational Measurement, 1999
Focused on whether differential item functioning (DIF) is related to item type in translated test items and the causes of DIF using data from an Israeli college entrance test in Hebrew and a Russian translation. Results from 24,304 college applicants indicate that 34% of items functioned differently across items. (SLD)
Descriptors: College Applicants, College Entrance Examinations, Foreign Countries, Hebrew
Previous Page | Next Page ยป
Pages: 1  |  2