Publication Date
| In 2026 | 0 |
| Since 2025 | 1 |
| Since 2022 (last 5 years) | 5 |
| Since 2017 (last 10 years) | 7 |
| Since 2007 (last 20 years) | 18 |
Descriptor
| Item Analysis | 38 |
| Test Format | 38 |
| Test Validity | 38 |
| Test Items | 27 |
| Test Reliability | 22 |
| Test Construction | 16 |
| Multiple Choice Tests | 10 |
| Difficulty Level | 7 |
| Foreign Countries | 7 |
| Higher Education | 7 |
| Language Tests | 7 |
| More ▼ | |
Source
Author
| Benson, Jeri | 2 |
| Huntley, Renee M. | 2 |
| Abramzon, Andrea | 1 |
| Ali, Syed Haris | 1 |
| Allalouf, Avi | 1 |
| Alweis, Richard L. | 1 |
| Arce-Ferrer, Alvaro J. | 1 |
| Ault, Marilyn | 1 |
| Austin, Joe Dan | 1 |
| Beglar, David | 1 |
| Berberoglu, Giray | 1 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 5 |
| Researchers | 3 |
| Teachers | 3 |
| Administrators | 1 |
Location
| Canada | 1 |
| Georgia | 1 |
| Japan | 1 |
| Mexico | 1 |
| Nebraska | 1 |
| New York | 1 |
| New York (Albany) | 1 |
| New York (Buffalo) | 1 |
| New York (New York) | 1 |
| New York (Rochester) | 1 |
| New York (Syracuse) | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Individuals with Disabilities… | 1 |
| No Child Left Behind Act 2001 | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Xueliang Chen; Vahid Aryadoust; Wenxin Zhang – Language Testing, 2025
The growing diversity among test takers in second or foreign language (L2) assessments makes the importance of fairness front and center. This systematic review aimed to examine how fairness in L2 assessments was evaluated through differential item functioning (DIF) analysis. A total of 83 articles from 27 journals were included in a systematic…
Descriptors: Second Language Learning, Language Tests, Test Items, Item Analysis
David Bell; Vikki O'Neill; Vivienne Crawford – Practitioner Research in Higher Education, 2023
We compared the influence of open-book extended duration versus closed book time-limited format on reliability and validity of written assessments of pharmacology learning outcomes within our medical and dental courses. Our dental cohort undertake a mid-year test (30xfree-response short answer to a question, SAQ) and end-of-year paper (4xSAQ,…
Descriptors: Undergraduate Students, Pharmacology, Pharmaceutical Education, Test Format
Shear, Benjamin R. – Journal of Educational Measurement, 2023
Large-scale standardized tests are regularly used to measure student achievement overall and for student subgroups. These uses assume tests provide comparable measures of outcomes across student subgroups, but prior research suggests score comparisons across gender groups may be complicated by the type of test items used. This paper presents…
Descriptors: Gender Bias, Item Analysis, Test Items, Achievement Tests
Nebraska Department of Education, 2024
The Nebraska Student-Centered Assessment System (NSCAS) is a statewide assessment system that embodies Nebraska's holistic view of students and helps them prepare for success in postsecondary education, career, and civic life. It uses multiple measures throughout the year to provide educators and decision-makers at all levels with the insights…
Descriptors: Student Evaluation, Evaluation Methods, Elementary School Students, Middle School Students
Arce-Ferrer, Alvaro J.; Bulut, Okan – Journal of Experimental Education, 2019
This study investigated the performance of four widely used data-collection designs in detecting test-mode effects (i.e., computer-based versus paper-based testing). The experimental conditions included four data-collection designs, two test-administration modes, and the availability of an anchor assessment. The test-level and item-level results…
Descriptors: Data Collection, Test Construction, Test Format, Computer Assisted Testing
Masrai, Ahmed – SAGE Open, 2022
Vocabulary size measures serve important functions, not only with respect to placing learners at appropriate levels on language courses but also with a view to examining the progress of learners. One of the widely reported formats suitable for these purposes is the Yes/No vocabulary test. The primary aim of this study was to introduce and provide…
Descriptors: Vocabulary Development, Language Tests, English (Second Language), Second Language Learning
Katrina C. Roohr; HyeSun Lee; Jun Xu; Ou Lydia Liu; Zhen Wang – Numeracy, 2017
Quantitative literacy has been identified as an important student learning outcome (SLO) by both the higher education and workforce communities. This paper aims to provide preliminary evidence of the psychometric quality of the pilot forms for "HEIghten" quantitative literacy, a next-generation SLO assessment for students in higher…
Descriptors: Psychometrics, Numeracy, Test Items, Item Analysis
Ali, Syed Haris; Carr, Patrick A.; Ruit, Kenneth G. – Journal of the Scholarship of Teaching and Learning, 2016
Plausible distractors are important for accurate measurement of knowledge via multiple-choice questions (MCQs). This study demonstrates the impact of higher distractor functioning on validity and reliability of scores obtained on MCQs. Freeresponse (FR) and MCQ versions of a neurohistology practice exam were given to four cohorts of Year 1 medical…
Descriptors: Scores, Multiple Choice Tests, Test Reliability, Test Validity
Öztürk-Gübes, Nese; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2016
The purpose of this study was to examine the impact of dimensionality, common-item set format, and different scale linking methods on preserving equity property with mixed-format test equating. Item response theory (IRT) true-score equating (TSE) and IRT observed-score equating (OSE) methods were used under common-item nonequivalent groups design.…
Descriptors: Test Format, Item Response Theory, True Scores, Equated Scores
Zhang, Xijuan; Savalei, Victoria – Educational and Psychological Measurement, 2016
Many psychological scales written in the Likert format include reverse worded (RW) items in order to control acquiescence bias. However, studies have shown that RW items often contaminate the factor structure of the scale by creating one or more method factors. The present study examines an alternative scale format, called the Expanded format,…
Descriptors: Factor Structure, Psychological Testing, Alternative Assessment, Test Items
Frey, Bruce B.; Ellis, James D.; Bulgreen, Janis A.; Hare, Jana Craig; Ault, Marilyn – Electronic Journal of Science Education, 2015
"Scientific argumentation," defined as the ability to develop and analyze scientific claims, support claims with evidence from investigations of the natural world, and explain and evaluate the reasoning that connects the evidence to the claim, is a critical component of current science standards and is consistent with "Common Core…
Descriptors: Test Construction, Science Tests, Persuasive Discourse, Science Process Skills
McLean, Stuart; Kramer, Brandon; Beglar, David – Language Teaching Research, 2015
An important gap in the field of second language vocabulary assessment concerns the lack of validated tests measuring aural vocabulary knowledge. The primary purpose of this study is to introduce and provide preliminary validity evidence for the Listening Vocabulary Levels Test (LVLT), which has been designed as a diagnostic tool to measure…
Descriptors: Test Construction, Test Validity, English (Second Language), Second Language Learning
Alweis, Richard L.; Fitzpatrick, Caroline; Donato, Anthony A. – Journal of Education and Training Studies, 2015
Introduction: The Multiple Mini-Interview (MMI) format appears to mitigate individual rater biases. However, the format itself may introduce structural systematic bias, favoring extroverted personality types. This study aimed to gain a better understanding of these biases from the perspective of the interviewer. Methods: A sample of MMI…
Descriptors: Interviews, Interrater Reliability, Qualitative Research, Semi Structured Interviews
New York State Education Department, 2015
This technical report provides an overview of the New York State Alternate Assessment (NYSAA), including a description of the purpose of the NYSAA, the processes utilized to develop and implement the NYSAA program, and Stakeholder involvement in those processes. By comparing the intent of the NYSAA with its process and design, the validity of the…
Descriptors: Alternative Assessment, Grade 3, Grade 4, Grade 5
Camilli, Gregory – Educational Research and Evaluation, 2013
In the attempt to identify or prevent unfair tests, both quantitative analyses and logical evaluation are often used. For the most part, fairness evaluation is a pragmatic attempt at determining whether procedural or substantive due process has been accorded to either a group of test takers or an individual. In both the individual and comparative…
Descriptors: Alternative Assessment, Test Bias, Test Content, Test Format

Peer reviewed
Direct link
