Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 7 |
| Since 2007 (last 20 years) | 32 |
Descriptor
| Educational Testing | 175 |
| Test Bias | 175 |
| Testing Problems | 74 |
| Test Validity | 67 |
| Standardized Tests | 55 |
| Elementary Secondary Education | 52 |
| Test Construction | 42 |
| Test Interpretation | 40 |
| Achievement Tests | 36 |
| Educational Assessment | 34 |
| Student Evaluation | 32 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Elementary Secondary Education | 12 |
| Elementary Education | 6 |
| Higher Education | 6 |
| Postsecondary Education | 5 |
| Grade 5 | 2 |
| Grade 8 | 2 |
| High Schools | 2 |
| Grade 3 | 1 |
| Grade 4 | 1 |
| Grade 9 | 1 |
| Junior High Schools | 1 |
| More ▼ | |
Location
| Canada | 4 |
| Massachusetts | 2 |
| South Africa | 2 |
| United Kingdom (Great Britain) | 2 |
| United States | 2 |
| Australia | 1 |
| China | 1 |
| Iowa | 1 |
| Iran | 1 |
| Minnesota | 1 |
| Netherlands | 1 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
ETS Research Institute, 2024
ETS experts are exploring and defining the standards for responsible AI use in assessments. A comprehensive framework and principles will be unveiled in the coming months. In the meantime, this document outlines the critical areas these standards will encompass, including the principles of: (1) Fairness and bias mitigation; (2) Privacy and…
Descriptors: Artificial Intelligence, Computer Assisted Testing, Educational Testing, Ethics
Gorney, Kylie – ProQuest LLC, 2023
Aberrant behavior refers to any type of unusual behavior that would not be expected under normal circumstances. In educational and psychological testing, such behaviors have the potential to severely bias the aberrant examinee's test score while also jeopardizing the test scores of countless others. It is therefore crucial that aberrant examinees…
Descriptors: Behavior Problems, Educational Testing, Psychological Testing, Test Bias
Salmani Nodoushan, Mohammad Ali – Online Submission, 2021
This paper follows a line of logical argumentation to claim that what Samuel Messick conceptualized about construct validation has probably been misunderstood by some educational policy makers, practicing educators, and classroom teachers. It argues that, while Messick's unified theory of test validation aimed at (a) warning educational…
Descriptors: Construct Validity, Test Theory, Test Use, Affordances
Nisbet, Isabel; Shaw, Stuart D. – Assessment in Education: Principles, Policy & Practice, 2019
Fairness in assessment is seen as increasingly important but there is a need for greater clarity in use of the term 'fair'. Also, fairness is perceived through a range of 'lenses' reflecting different traditions of thought. The lens used determines how fairness is seen and described. This article distinguishes different uses of 'fair' which have…
Descriptors: Test Bias, Measurement, Theories, Educational Assessment
Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2019
The Mantel-Haenszel delta difference (MH D-DIF) and the standardized proportion difference (STD P-DIF) are two observed-score methods that have been used to assess differential item functioning (DIF) at Educational Testing Service since the early 1990s. Latentvariable approaches to assessing measurement invariance at the item level have been…
Descriptors: Test Bias, Educational Testing, Statistical Analysis, Item Response Theory
Mousavi, Amin; Cui, Ying – Education Sciences, 2020
Often, important decisions regarding accountability and placement of students in performance categories are made on the basis of test scores generated from tests, therefore, it is important to evaluate the validity of the inferences derived from test results. One of the threats to the validity of such inferences is aberrant responding. Several…
Descriptors: Student Evaluation, Educational Testing, Psychological Testing, Item Response Theory
Borsboom, Denny; Wijsen, Lisa D. – Assessment in Education: Principles, Policy & Practice, 2017
The central role of educational testing practices in contemporary societies can hardly be overstated. It is furthermore evident that psychometric models regulate, justify, and legitimize the processes through which educational testing practices are used. In this commentary, the authors offer some observations that may be relevant for the analyses…
Descriptors: Educational Assessment, Learning, Psychometrics, Power Structure
DeMars, Christine E.; Jurich, Daniel P. – Educational and Psychological Measurement, 2015
In educational testing, differential item functioning (DIF) statistics must be accurately estimated to ensure the appropriate items are flagged for inspection or removal. This study showed how using the Rasch model to estimate DIF may introduce considerable bias in the results when there are large group differences in ability (impact) and the data…
Descriptors: Test Bias, Guessing (Tests), Ability, Differences
Ravand, Hamdollah – Practical Assessment, Research & Evaluation, 2015
Multilevel models (MLMs) are flexible in that they can be employed to obtain item and person parameters, test for differential item functioning (DIF) and capture both local item and person dependence. Papers on the MLM analysis of item response data have focused mostly on theoretical issues where applications have been add-ons to simulation…
Descriptors: Item Response Theory, Hierarchical Linear Modeling, Educational Testing, Reading Comprehension
Zwick, Rebecca; Ye, Lei; Isham, Steven – ETS Research Report Series, 2013
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. Although it is often assumed that refinement of the matching criterion always provides more accurate DIF results, the actual situation proves to be more complex. To explore the effectiveness of refinement, we…
Descriptors: Test Bias, Statistical Analysis, Simulation, Educational Testing
Socha, Alan; DeMars, Christine E.; Zilberberg, Anna; Phan, Ha – International Journal of Testing, 2015
The Mantel-Haenszel (MH) procedure is commonly used to detect items that function differentially for groups of examinees from various demographic and linguistic backgrounds--for example, in international assessments. As in some other DIF methods, the total score is used to match examinees on ability. In thin matching, each of the total score…
Descriptors: Test Items, Educational Testing, Evaluation Methods, Ability Grouping
Bennink, Margot; Croon, Marcel A.; Keuning, Jos; Vermunt, Jeroen K. – Journal of Educational and Behavioral Statistics, 2014
In educational measurement, responses of students on items are used not only to measure the ability of students, but also to evaluate and compare the performance of schools. Analysis should ideally account for the multilevel structure of the data, and school-level processes not related to ability, such as working climate and administration…
Descriptors: Academic Ability, Educational Assessment, Educational Testing, Test Bias
Popham, W. James – Phi Delta Kappan, 2014
The tests we use to evaluate student achievement may well be sound measures of what students know, but they are faulty indicators at best of how well they have been taught. A remedy to this this situation of judging teachers by the performance of their students on high-stakes tests may be in hand already. We should look to the methods successfully…
Descriptors: High Stakes Tests, Academic Achievement, Teacher Evaluation, Evaluation Methods
Zwick, Rebecca – ETS Research Report Series, 2012
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods
Emenogu, Barnabas C.; Falenchuk, Olesya; Childs, Ruth A. – Alberta Journal of Educational Research, 2010
Most implementations of the Mantel-Haenszel differential item functioning procedure delete records with missing responses or replace missing responses with scores of 0. These treatments of missing data make strong assumptions about the causes of the missing data. Such assumptions may be particularly problematic when groups differ in their patterns…
Descriptors: Foreign Countries, Test Bias, Test Items, Educational Testing

Direct link
Peer reviewed
