Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 6 |
| Since 2007 (last 20 years) | 13 |
Descriptor
| Evaluation Methods | 19 |
| Sample Size | 19 |
| Test Items | 19 |
| Item Response Theory | 8 |
| Correlation | 6 |
| Item Analysis | 6 |
| Simulation | 6 |
| Error of Measurement | 5 |
| Difficulty Level | 4 |
| Error Patterns | 4 |
| Monte Carlo Methods | 4 |
| More ▼ | |
Source
Author
| DeMars, Christine E. | 2 |
| Socha, Alan | 2 |
| Ahn, Soyeon | 1 |
| Ankenmann, Robert D. | 1 |
| Avsec, Stanislav | 1 |
| Bolt, Daniel M. | 1 |
| Brewer, Neil | 1 |
| Carlson, Alfred B. | 1 |
| Cook, Linda L. | 1 |
| Dai, Shenghai | 1 |
| Dwyer, Andrew C. | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 15 |
| Reports - Research | 13 |
| Reports - Evaluative | 4 |
| Speeches/Meeting Papers | 3 |
| Reports - Descriptive | 2 |
Education Level
| Elementary Education | 1 |
| Secondary Education | 1 |
Audience
| Researchers | 1 |
Location
Laws, Policies, & Programs
Assessments and Surveys
| Program for International… | 1 |
| SAT (College Admission Test) | 1 |
What Works Clearinghouse Rating
Novak, Josip; Rebernjak, Blaž – Measurement: Interdisciplinary Research and Perspectives, 2023
A Monte Carlo simulation study was conducted to examine the performance of [alpha], [lambda]2, [lambda][subscript 4], [lambda][subscript 2], [omega][subscript T], GLB[subscript MRFA], and GLB[subscript Algebraic] coefficients. Population reliability, distribution shape, sample size, test length, and number of response categories were varied…
Descriptors: Monte Carlo Methods, Evaluation Methods, Reliability, Simulation
Lucas, Carmen A.; Brewer, Neil; Young, Robyn L. – Journal of Psychoeducational Assessment, 2022
Evaluations of early screening tests for autism commonly rely on receiver operating characteristic (ROC) analysis and comparisons of area under the curve (AUC). Whether AUC differs significantly from chance or between test items is not always assessed. Two recent and independent evaluations of the Brief Autism Detection in Early Childhood (BADEC)…
Descriptors: Autism, Pervasive Developmental Disorders, Screening Tests, Evaluation Methods
Furter, Robert T.; Dwyer, Andrew C. – Applied Measurement in Education, 2020
Maintaining equivalent performance standards across forms is a psychometric challenge exacerbated by small samples. In this study, the accuracy of two equating methods (Rasch anchored calibration and nominal weights mean) and four anchor item selection methods were investigated in the context of very small samples (N = 10). Overall, nominal…
Descriptors: Classification, Accuracy, Item Response Theory, Equated Scores
Yesiltas, Gonca; Paek, Insu – Educational and Psychological Measurement, 2020
A log-linear model (LLM) is a well-known statistical method to examine the relationship among categorical variables. This study investigated the performance of LLM in detecting differential item functioning (DIF) for polytomously scored items via simulations where various sample sizes, ability mean differences (impact), and DIF types were…
Descriptors: Simulation, Sample Size, Item Analysis, Scores
Park, Sung Eun; Ahn, Soyeon; Zopluoglu, Cengiz – Educational and Psychological Measurement, 2021
This study presents a new approach to synthesizing differential item functioning (DIF) effect size: First, using correlation matrices from each study, we perform a multigroup confirmatory factor analysis (MGCFA) that examines measurement invariance of a test item between two subgroups (i.e., focal and reference groups). Then we synthesize, across…
Descriptors: Item Analysis, Effect Size, Difficulty Level, Monte Carlo Methods
Wang, Xiaolin; Svetina, Dubravka; Dai, Shenghai – Journal of Experimental Education, 2019
Recently, interest in test subscore reporting for diagnosis purposes has been growing rapidly. The two simulation studies here examined factors (sample size, number of subscales, correlation between subscales, and three factors affecting subscore reliability: number of items per subscale, item parameter distribution, and data generating model)…
Descriptors: Value Added Models, Scores, Sample Size, Correlation
Avsec, Stanislav; Jamšek, Janez – International Journal of Technology and Design Education, 2016
Technological literacy is identified as a vital achievement of technology- and engineering-intensive education. It guides the design of technology and technical components of educational systems and defines competitive employment in technological society. Existing methods for measuring technological literacy are incomplete or complicated,…
Descriptors: Technological Literacy, Elementary School Students, Secondary School Students, Evaluation Methods
Hitchcock, John H.; Johanson, George A. – Research in the Schools, 2015
Understanding the reason(s) for Differential Item Functioning (DIF) in the context of measurement is difficult. Although identifying potential DIF items is typically a statistical endeavor, understanding the reasons for DIF (and item repair or replacement) might require investigations that can be informed by qualitative work. Such work is…
Descriptors: Mixed Methods Research, Test Items, Item Analysis, Measurement
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Socha, Alan; DeMars, Christine E. – Educational and Psychological Measurement, 2013
Modeling multidimensional test data with a unidimensional model can result in serious statistical errors, such as bias in item parameter estimates. Many methods exist for assessing the dimensionality of a test. The current study focused on DIMTEST. Using simulated data, the effects of sample size splitting for use with the ATFIND procedure for…
Descriptors: Sample Size, Test Length, Correlation, Test Format
Wyse, Adam E.; Mapuranga, Raymond – International Journal of Testing, 2009
Differential item functioning (DIF) analysis is a statistical technique used for ensuring the equity and fairness of educational assessments. This study formulates a new DIF analysis method using the information similarity index (ISI). ISI compares item information functions when data fits the Rasch model. Through simulations and an international…
Descriptors: Test Bias, Evaluation Methods, Test Items, Educational Assessment
Yoo, Jin Eun – Educational and Psychological Measurement, 2009
This Monte Carlo study investigates the beneficiary effect of including auxiliary variables during estimation of confirmatory factor analysis models with multiple imputation. Specifically, it examines the influence of sample size, missing rates, missingness mechanism combinations, missingness types (linear or convex), and the absence or presence…
Descriptors: Monte Carlo Methods, Research Methodology, Test Validity, Factor Analysis
Wells, Craig S.; Bolt, Daniel M. – Applied Measurement in Education, 2008
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Descriptors: Test Length, Test Items, Monte Carlo Methods, Nonparametric Statistics
Smith, Richard M.; And Others – 1995
In the mid to late 1970s, considerable research was conducted on the properties of Rasch fit mean squares, resulting in transformations to convert the mean squares into approximate t-statistics. In the late 1980s and the early 1990s, the trend seems to have reversed, with numerous researchers using the untransformed fit mean squares as a means of…
Descriptors: Evaluation Methods, Goodness of Fit, Item Response Theory, Sample Size
Monahan, Patrick O.; Ankenmann, Robert D. – Journal of Educational Measurement, 2005
Empirical studies demonstrated Type-I error (TIE) inflation (especially for highly discriminating easy items) of the Mantel-Haenszel chi-square test for differential item functioning (DIF), when data conformed to item response theory (IRT) models more complex than Rasch, and when IRT proficiency distributions differed only in means. However, no…
Descriptors: Sample Size, Item Response Theory, Test Items, Test Bias
Previous Page | Next Page »
Pages: 1 | 2
Peer reviewed
Direct link
