ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	3
Since 2007 (last 20 years)	11

Descriptor

Comparative Analysis	17
Test Bias	17
Test Format	17
Test Items	10
Foreign Countries	5
Item Response Theory	5
Computer Assisted Testing	4
Difficulty Level	4
Statistical Analysis	4
Adaptive Testing	3
English (Second Language)	3
Equated Scores	3
Grade 8	3
Language Tests	3
Multiple Choice Tests	3
Simulation	3
Student Evaluation	3
Test Reliability	3
Ability	2
Comparative Testing	2
Correlation	2
Educational Assessment	2
Educational Testing	2
Evaluation Criteria	2
High Stakes Tests	2
More ▼

Source

ETS Research Report Series	2
Advances in Health Sciences…	1
Assessment & Evaluation in…	1
Educational Sciences: Theory…	1
International Journal of…	1
Journal of Applied Measurement	1
Journal of Clinical Psychology	1
Journal of Communication…	1
Journal of Educational…	1
Journal of Technology,…	1
Language Testing	1
National Center for Research…	1
SAGE Open	1
More ▼

Publication Type

Reports - Research	13
Journal Articles	12
Reports - Evaluative	4
Speeches/Meeting Papers	4
Collected Works - Serials	1

Education Level

Grade 8	3
Higher Education	3
Postsecondary Education	3
Elementary Education	2
Elementary Secondary Education	1
Grade 3	1
Grade 9	1
Junior High Schools	1
Middle Schools	1
Secondary Education	1

Audience

Location

Australia	1
Iran	1
Japan	1
Spain	1
Turkey	1
United States	1

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	1
Stanford Achievement Tests	1
Trends in International…	1

What Works Clearinghouse Rating

Showing 1 to 15 of 17 results Save | Export

An Investigation of Item Position Effects by Means of IRT-Based Differential Item Functioning Methods

Peer reviewed
PDF on ERIC

Download full text

Soysal, Sumeyra; Yilmaz Kogar, Esin – International Journal of Assessment Tools in Education, 2021

In this study, whether item position effects lead to DIF in the condition where different test booklets are used was investigated. To do this the methods of Lord's chi-square and Raju's unsigned area with the 3PL model under with and without item purification were used. When the performance of the methods was compared, it was revealed that…

Descriptors: Item Response Theory, Test Bias, Test Items, Comparative Analysis

Multiple True-False Items: A Comparison of Scoring Algorithms

Peer reviewed

Direct link

Lahner, Felicitas-Maria; Lörwald, Andrea Carolin; Bauer, Daniel; Nouns, Zineb Miriam; Krebs, René; Guttormsen, Sissel; Fischer, Martin R.; Huwendiek, Sören – Advances in Health Sciences Education, 2018

Multiple true-false (MTF) items are a widely used supplement to the commonly used single-best answer (Type A) multiple choice format. However, an optimal scoring algorithm for MTF items has not yet been established, as existing studies yielded conflicting results. Therefore, this study analyzes two questions: What is the optimal scoring algorithm…

Descriptors: Scoring Formulas, Scoring Rubrics, Objective Tests, Multiple Choice Tests

Method Bias in Cloze Tests as Reading Comprehension Measures

Peer reviewed

Direct link

Baghaei, Purya; Ravand, Hamdollah – SAGE Open, 2019

In many reading comprehension tests, different test formats are employed. Two commonly used test formats to measure reading comprehension are sustained passages followed by some questions and cloze items. Individual differences in handling test format peculiarities could constitute a source of score variance. In this study, a bifactor Rasch model…

Descriptors: Cloze Procedure, Test Bias, Individual Differences, Difficulty Level

Effect of Differential Item Functioning on Test Equating

Peer reviewed
PDF on ERIC

Download full text

Kabasakal, Kübra Atalay; Kelecioglu, Hülya – Educational Sciences: Theory and Practice, 2015

This study examines the effect of differential item functioning (DIF) items on test equating through multilevel item response models (MIRMs) and traditional IRMs. The performances of three different equating models were investigated under 24 different simulation conditions, and the variables whose effects were examined included sample size, test…

Descriptors: Test Bias, Equated Scores, Item Response Theory, Simulation

An Item-Driven Adaptive Design for Calibrating Pretest Items. Research Report. ETS RR-14-38

Peer reviewed
PDF on ERIC

Download full text

Ali, Usama S.; Chang, Hua-Hua – ETS Research Report Series, 2014

Adaptive testing is advantageous in that it provides more efficient ability estimates with fewer items than linear testing does. Item-driven adaptive pretesting may also offer similar advantages, and verification of such a hypothesis about item calibration was the main objective of this study. A suitability index (SI) was introduced to adaptively…

Descriptors: Adaptive Testing, Simulation, Pretests Posttests, Test Items

A Comparison of Video- and Audio-Mediated Listening Tests with Many-Facet Rasch Modeling and Differential Distractor Functioning

Peer reviewed

Direct link

Batty, Aaron Olaf – Language Testing, 2015

The rise in the affordability of quality video production equipment has resulted in increased interest in video-mediated tests of foreign language listening comprehension. Although research on such tests has continued fairly steadily since the early 1980s, studies have relied on analyses of raw scores, despite the growing prevalence of item…

Descriptors: Listening Comprehension Tests, Comparative Analysis, Video Technology, Audio Equipment

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

Exploring Differential Item Functioning (DIF) with the Rasch Model: A Comparison of Gender Differences on Eighth Grade Science Items in the United States and Spain

Peer reviewed

Direct link

Babiar, Tasha Calvert – Journal of Applied Measurement, 2011

Traditionally, women and minorities have not been fully represented in science and engineering. Numerous studies have attributed these differences to gaps in science achievement as measured by various standardized tests. Rather than describe mean group differences in science achievement across multiple cultures, this study focused on an in-depth…

Descriptors: Test Bias, Science Achievement, Standardized Tests, Grade 8

Small-Sample Equating Using a Synthetic Linking Function

Peer reviewed

Direct link

Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – Journal of Educational Measurement, 2008

This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically,…

Descriptors: Equated Scores, Sample Size, Test Reliability, Comparative Analysis

Examining Differential Item Functioning in Reading Assessments for Students with Disabilities. CRESST Report 744

Download full text

Abedi, Jamal; Leon, Seth; Kao, Jenny C. – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2008

This study examines performance differences between students with disabilities and students without disabilities students using differential item functioning (DIF) analyses in a high-stakes reading assessment. Results indicated that for Grade 9, many items exhibited DIF. Items that exhibited DIF were more likely to be located in the second half…

Descriptors: Test Bias, Test Items, Student Evaluation, Disabilities

Comparison of Response Formats for the Depression Adjective Check Lists.

Peer reviewed

Lubin, Bernard; And Others – Journal of Clinical Psychology, 1981

Studied potential checking and response bias on the Depression Adjective Check Lists (DACL) by comparing scores of college students on true-false, forced-choice, and standard formats. By demonstrating only a weak social desirability response bias and failing to reveal checking bias, results support using the standard format. (Author)

Descriptors: Comparative Analysis, Depression (Psychology), Psychometrics, Response Style (Tests)

Prior Degree and Student Assessment Performance: How Can Evidence Guide Decisions on Assessment Policy?

Peer reviewed

Direct link

Craig, Pippa; Gordon, Jill; Clarke, Rufus; Oldmeadow, Wendy – Assessment & Evaluation in Higher Education, 2009

This study aimed to provide evidence to guide decisions on the type and timing of assessments in a graduate medical programme, by identifying whether students from particular degree backgrounds face greater difficulty in satisfying the current assessment requirements. We examined the performance rank of students in three types of assessments and…

Descriptors: Student Evaluation, Medical Education, Student Characteristics, Correlation

The Goal of Equity within and between Computerized Adaptive Tests and Paper and Pencil Forms.

Download full text

Thomasson, Gary L. – 1997

Score comparability is important to those who take tests and those who use them. One important concept related to test score comparability is that of "equity," which is defined as existing when examinees are indifferent as to which of two alternate forms of a test they would prefer to take. By their nature, computerized adaptive tests…

Descriptors: Ability, Adaptive Testing, Comparative Analysis, Computer Assisted Testing

What Combination of Sampling and Equating Methods Works Best? Revised.

Download full text

Livingston, Samuel A.; And Others – 1989

Combinations of five methods of equating test forms and two methods of selecting samples of students for equating were compared for accuracy. The two sampling methods were representative sampling from the population and matching samples on the anchor test score. The equating methods were: (1) the Tucker method; (2) the Levine method; (3) the…

Descriptors: Comparative Analysis, Data Collection, Equated Scores, High School Students

Comparing Dual-Language Versions of an International Computerized-Adaptive Certification Exam.

Download full text

Sireci, Stephen G.; Foster, David F.; Robin, Frederic; Olsen, James – 1997

Evaluating the comparability of a test administered in different languages is a difficult, if not impossible, task. Comparisons are problematic because observed differences in test performance between groups who take different language versions of a test could be due to a difference in difficulty between the tests, to cultural differences in test…

Descriptors: Adaptive Testing, Adults, Certification, Comparative Analysis

Previous Page | Next Page »

Pages: 1 | 2

Abedi, Jamal	1
Ali, Usama S.	1
Allen, Nancy	1
Babiar, Tasha Calvert	1
Baghaei, Purya	1
Batty, Aaron Olaf	1
Bauer, Daniel	1
Bennett, Randy Elliott	1
Chang, Hua-Hua	1
Clarke, Rufus	1
Craig, Pippa	1
Fischer, Martin R.	1
Foster, David F.	1
Gordon, Jill	1
Guttormsen, Sissel	1
Haberman, Shelby	1
Horkay, Nancy	1
Huwendiek, Sören	1
Kabasakal, Kübra Atalay	1
Kao, Jenny C.	1
Kaplan, Bruce	1
Kelecioglu, Hülya	1
Kim, Sooyeon	1
Krebs, René	1
More ▼