Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 3 |
| Since 2007 (last 20 years) | 11 |
Descriptor
| Comparative Analysis | 17 |
| Test Bias | 17 |
| Test Format | 17 |
| Test Items | 10 |
| Foreign Countries | 5 |
| Item Response Theory | 5 |
| Computer Assisted Testing | 4 |
| Difficulty Level | 4 |
| Statistical Analysis | 4 |
| Adaptive Testing | 3 |
| English (Second Language) | 3 |
| More ▼ | |
Source
Author
| Abedi, Jamal | 1 |
| Ali, Usama S. | 1 |
| Allen, Nancy | 1 |
| Babiar, Tasha Calvert | 1 |
| Baghaei, Purya | 1 |
| Batty, Aaron Olaf | 1 |
| Bauer, Daniel | 1 |
| Bennett, Randy Elliott | 1 |
| Chang, Hua-Hua | 1 |
| Clarke, Rufus | 1 |
| Craig, Pippa | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 13 |
| Journal Articles | 12 |
| Reports - Evaluative | 4 |
| Speeches/Meeting Papers | 4 |
| Collected Works - Serials | 1 |
Education Level
| Grade 8 | 3 |
| Higher Education | 3 |
| Postsecondary Education | 3 |
| Elementary Education | 2 |
| Elementary Secondary Education | 1 |
| Grade 3 | 1 |
| Grade 9 | 1 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
| Secondary Education | 1 |
Audience
Laws, Policies, & Programs
Assessments and Surveys
| SAT (College Admission Test) | 1 |
| Stanford Achievement Tests | 1 |
| Trends in International… | 1 |
What Works Clearinghouse Rating
Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021
In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…
Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis
Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018
Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…
Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests
Baghaei, Purya; Ravand, Hamdollah – SAGE Open, 2019
In many reading comprehension tests, different test formats are employed. Two commonly used test formats to measure reading comprehension are sustained passages followed by some questions and cloze items. Individual differences in handling test format peculiarities could constitute a source of score variance. In this study, a bifactor Rasch model…
Descriptors: Cloze Procedure, Test Bias, Individual Differences, Difficulty Level
Kabasakal, Kübra Atalay; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2015
This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…
Descriptors: Test Bias, Equated Scores, Item Response Theory, Simulation
Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014
Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…
Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items
Batty, Aaron Olaf – Language Testing, 2015
The rise in the affordability of quality video production equipment has resulted in increased interest in video-mediated tests of foreign language listening comprehension. Although research on such tests has continued fairly steadily since the early 1980s, studies have relied on analyses of raw scores, despite the growing prevalence of item…
Descriptors: Listening Comprehension Tests, Comparative Analysis, Video Technology, Audio Equipment
Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013
The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Descriptors: Test Format, Test Items, Responses, Computation
Babiar, Tasha Calvert – Journal of Applied Measurement, 2011
Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth…
Descriptors: Test Bias, Science Achievement, Standardized Tests, Grade 8
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…
Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis
Abedi, Jamal; Leon, Seth; Kao, Jenny C. – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2008
This study examines performance differences between students with disabilities and students without disabilities students using differential item functioning (DIF) analyses in a high-stakes reading assessment. Results indicated that for Grade 9, many items exhibited DIF. Items that exhibited DIF were more likely to be located in the second half…
Descriptors: Test Bias, Test Items, Student Evaluation, Disabilities
Peer reviewedLubin, Bernard; And Others – Journal of Clinical Psychology, 1981
Studied potential checking and response bias on the Depression Adjective Check Lists (DACL) by comparing scores of college students on true-false, forced-choice, and standard formats. By demonstrating only a weak social desirability response bias and failing to reveal checking bias, results support using the standard format. (Author)
Descriptors: Comparative Analysis, Depression (Psychology), Psychometrics, Response Style (Tests)
Craig, Pippa; Gordon, Jill; Clarke, Rufus; Oldmeadow, Wendy – Assessment & Evaluation in Higher Education, 2009
This study aimed to provide evidence to guide decisions on the type and timing of assessments in a graduate medical programme, by identifying whether students from particular degree backgrounds face greater difficulty in satisfying the current assessment requirements. We examined the performance rank of students in three types of assessments and…
Descriptors: Student Evaluation, Medical Education, Student Characteristics, Correlation
Thomasson, Gary L. – 1997
Score comparability is important to those who take tests and those who use them. One important concept related to test score comparability is that of "equity," which is defined as existing when examinees are indifferent as to which of two alternate forms of a test they would prefer to take. By their nature, computerized adaptive tests…
Descriptors: Ability, Adaptive Testing, Comparative Analysis, Computer Assisted Testing
Livingston, Samuel A.; And Others – 1989
Combinations of five methods of equating test forms and two methods of selecting samples of students for equating were compared for accuracy. The two sampling methods were representative sampling from the population and matching samples on the anchor test score. The equating methods were: (1) the Tucker method; (2) the Levine method; (3) the…
Descriptors: Comparative Analysis, Data Collection, Equated Scores, High School Students
Sireci, Stephen G.; Foster, David F.; Robin, Frederic; Olsen, James – 1997
Evaluating the comparability of a test administered in different languages is a difficult, if not impossible, task. Comparisons are problematic because observed differences in test performance between groups who take different language versions of a test could be due to a difference in difficulty between the tests, to cultural differences in test…
Descriptors: Adaptive Testing, Adults, Certification, Comparative Analysis
Previous Page | Next Page »
Pages: 1 | 2
Direct link
