ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	8

Descriptor

Comparative Analysis	9
Test Reliability	9
Item Response Theory	5
Scores	5
Models	4
Statistical Analysis	4
Test Items	4
Computer Assisted Testing	3
Correlation	3
Regression (Statistics)	3
Test Format	3
Construct Validity	2
Factor Analysis	2
Multiple Choice Tests	2
Responses	2
Sample Size	2
Scoring	2
Second Language Learning	2
Test Bias	2
Test Length	2
Ability	1
Academic Achievement	1
Accuracy	1
Achievement Gap	1
Achievement Tests	1
More ▼

Source

ETS Research Report Series

Publication Type

Journal Articles	9
Reports - Research	9

Education Level

Higher Education	2
Postsecondary Education	2

Audience

Location

Laws, Policies, & Programs

Assessments and Surveys

Major Field Achievement Test…	1
Praxis Series	1
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 9 results Save | Export

Distractor Analysis for Multiple-Choice Tests: An Empirical Study with International Language Assessment Data. Research Report. ETS RR-19-39

Peer reviewed
PDF on ERIC

Download full text

Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019

Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…

Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests

Evaluation of Different Scoring Rules for a Noncognitive Test in Development. Research Report. ETS RR-16-03

Peer reviewed
PDF on ERIC

Download full text

Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016

In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…

Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics

The Effects of Rater Severity and Rater Distribution on Examinees' Ability Estimation for Constructed-Response Items. Research Report. ETS RR-13-23

Peer reviewed
PDF on ERIC

Download full text

Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013

The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…

Descriptors: Test Format, Test Items, Responses, Computation

A Comparison of Achievement Gaps and Test-Taker Characteristics on Computer-Delivered and Paper-Delivered "Praxis I"® Tests. Research Report. ETS RR-14-35

Peer reviewed
PDF on ERIC

Download full text

Steinberg, Jonathan; Brenneman, Meghan; Castellano, Karen; Lin, Peng; Miller, Susanne – ETS Research Report Series, 2014

Test providers are increasingly moving toward exclusively administering assessments by computer. Computerized testing is becoming more desirable for test takers because of increased opportunities to test, faster turnaround of individual scores, or perhaps other factors, offering potential benefits for those who may be struggling to pass licensure…

Descriptors: Comparative Analysis, Achievement Gap, Academic Achievement, Test Format

Why the Major Field Test in Business Does Not Report Subscores: Reliability and Construct Validity Evidence. Research Report. ETS RR-12-11

Peer reviewed
PDF on ERIC

Download full text

Ling, Guangming – ETS Research Report Series, 2012

To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…

Descriptors: Factor Analysis, Construct Validity, Structural Equation Models, Correlation

Differential Item Functioning: Its Consequences. Research Report. ETS RR-10-01

Peer reviewed
PDF on ERIC

Download full text

Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010

This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…

Descriptors: Test Bias, Item Response Theory, Test Items, Scores

Comparison of Multistage Tests with Computerized Adaptive and Paper-and-Pencil Tests. Research Report. ETS RR-07-04

Peer reviewed
PDF on ERIC

Download full text

Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007

Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…

Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models

An Alternative to Equating with Small Samples in the Non-Equivalent Groups Anchor Test Design. Research Report. ETS RR-06-27

Peer reviewed
PDF on ERIC

Download full text

Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – ETS Research Report Series, 2006

This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be…

Descriptors: Equated Scores, Sample Size, Test Items, Statistical Bias

Construct Validity of "e-rater"® in Scoring TOEFL® Essays. Research Report. ETS RR-07-21

Peer reviewed
PDF on ERIC

Download full text

Attali, Yigal – ETS Research Report Series, 2007

This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…

Descriptors: Construct Validity, Computer Assisted Testing, Scoring, English (Second Language)

Lee, Yi-Hsuan	2
Attali, Yigal	1
Brenneman, Meghan	1
Castellano, Karen	1
Guo, Hongwen	1
Haberman, Shelby	1
Haberman, Shelby J.	1
Kim, Sooyeon	1
Kyllonen, Patrick	1
Lin, Peng	1
Ling, Guangming	1
Liu, Yang	1
Miller, Susanne	1
Patsula, Liane	1
Rizavi, Saba	1
Rotou, Ourania	1
Schmitt, Neal	1
Steffen, Manfred	1
Steinberg, Jonathan	1
Wang, Zhen	1
Yao, Lihua	1
Zhang, Jinming	1
Zu, Jiyun	1
von Davier, Alina A.	1
More ▼