Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 8 |
Descriptor
| Comparative Analysis | 9 |
| Test Reliability | 9 |
| Item Response Theory | 5 |
| Scores | 5 |
| Models | 4 |
| Statistical Analysis | 4 |
| Test Items | 4 |
| Computer Assisted Testing | 3 |
| Correlation | 3 |
| Regression (Statistics) | 3 |
| Test Format | 3 |
| More ▼ | |
Source
| ETS Research Report Series | 9 |
Author
| Lee, Yi-Hsuan | 2 |
| Attali, Yigal | 1 |
| Brenneman, Meghan | 1 |
| Castellano, Karen | 1 |
| Guo, Hongwen | 1 |
| Haberman, Shelby | 1 |
| Haberman, Shelby J. | 1 |
| Kim, Sooyeon | 1 |
| Kyllonen, Patrick | 1 |
| Lin, Peng | 1 |
| Ling, Guangming | 1 |
| More ▼ | |
Publication Type
| Journal Articles | 9 |
| Reports - Research | 9 |
Education Level
| Higher Education | 2 |
| Postsecondary Education | 2 |
Audience
Location
Laws, Policies, & Programs
Assessments and Surveys
| Major Field Achievement Test… | 1 |
| Praxis Series | 1 |
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Haberman, Shelby J.; Liu, Yang; Lee, Yi-Hsuan – ETS Research Report Series, 2019
Distractor analyses are routinely conducted in educational assessments with multiple-choice items. In this research report, we focus on three item response models for distractors: (a) the traditional nominal response (NR) model, (b) a combination of a two-parameter logistic model for item scores and a NR model for selections of incorrect…
Descriptors: Multiple Choice Tests, Scores, Test Reliability, High Stakes Tests
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics
Wang, Zhen; Yao, Lihua – ETS Research Report Series, 2013
The current study used simulated data to investigate the properties of a newly proposed method (Yao's rater model) for modeling rater severity and its distribution under different conditions. Our study examined the effects of rater severity, distributions of rater severity, the difference between item response theory (IRT) models with rater effect…
Descriptors: Test Format, Test Items, Responses, Computation
Steinberg, Jonathan; Brenneman, Meghan; Castellano, Karen; Lin, Peng; Miller, Susanne – ETS Research Report Series, 2014
Test providers are increasingly moving toward exclusively administering assessments by computer. Computerized testing is becoming more desirable for test takers because of increased opportunities to test, faster turnaround of individual scores, or perhaps other factors, offering potential benefits for those who may be struggling to pass licensure…
Descriptors: Comparative Analysis, Achievement Gap, Academic Achievement, Test Format
Ling, Guangming – ETS Research Report Series, 2012
To assess the value of individual students' subscores on the Major Field Test in Business (MFT Business), I examined the test's internal structure with factor analysis and structural equation model methods, and analyzed the subscore reliabilities using the augmented scores method. Analyses of the internal structure suggested that the MFT Business…
Descriptors: Factor Analysis, Construct Validity, Structural Equation Models, Correlation
Lee, Yi-Hsuan; Zhang, Jinming – ETS Research Report Series, 2010
This report examines the consequences of differential item functioning (DIF) using simulated data. Its impact on total score, item response theory (IRT) ability estimate, and test reliability was evaluated in various testing scenarios created by manipulating the following four factors: test length, percentage of DIF items per form, sample sizes of…
Descriptors: Test Bias, Item Response Theory, Test Items, Scores
Rotou, Ourania; Patsula, Liane; Steffen, Manfred; Rizavi, Saba – ETS Research Report Series, 2007
Traditionally, the fixed-length linear paper-and-pencil (P&P) mode of administration has been the standard method of test delivery. With the advancement of technology, however, the popularity of administering tests using adaptive methods like computerized adaptive testing (CAT) and multistage testing (MST) has grown in the field of measurement…
Descriptors: Comparative Analysis, Test Format, Computer Assisted Testing, Models
Kim, Sooyeon; von Davier, Alina A.; Haberman, Shelby – ETS Research Report Series, 2006
This study addresses the sample error and linking bias that occur with small and unrepresentative samples in a non-equivalent groups anchor test (NEAT) design. We propose a linking method called the "synthetic function," which is a weighted average of the identity function (the trivial equating function for forms that are known to be…
Descriptors: Equated Scores, Sample Size, Test Items, Statistical Bias
Attali, Yigal – ETS Research Report Series, 2007
This study examined the construct validity of the "e-rater"® automated essay scoring engine as an alternative to human scoring in the context of TOEFL® essay writing. Analyses were based on a sample of students who repeated the TOEFL within a short time period. Two "e-rater" scores were investigated in this study, the first…
Descriptors: Construct Validity, Computer Assisted Testing, Scoring, English (Second Language)

Peer reviewed
