Publication Date
In 2025 | 0 |
Since 2024 | 1 |
Since 2021 (last 5 years) | 4 |
Since 2016 (last 10 years) | 10 |
Since 2006 (last 20 years) | 27 |
Descriptor
Source
ETS Research Report Series | 33 |
Author
Guo, Hongwen | 4 |
Dorans, Neil J. | 3 |
Ling, Guangming | 3 |
Deane, Paul | 2 |
Futagi, Yoko | 2 |
Kim, Sooyeon | 2 |
Kyllonen, Patrick | 2 |
Livingston, Samuel A. | 2 |
Lu, Ru | 2 |
Oranje, Andreas | 2 |
Sinharay, Sandip | 2 |
More ▼ |
Publication Type
Journal Articles | 33 |
Reports - Research | 33 |
Tests/Questionnaires | 5 |
Information Analyses | 1 |
Numerical/Quantitative Data | 1 |
Education Level
Higher Education | 7 |
Postsecondary Education | 7 |
Secondary Education | 7 |
Elementary Education | 4 |
Junior High Schools | 4 |
Middle Schools | 4 |
Grade 8 | 3 |
Grade 7 | 2 |
Early Childhood Education | 1 |
Grade 12 | 1 |
High Schools | 1 |
More ▼ |
Audience
Location
Alabama | 1 |
Arizona | 1 |
Arkansas | 1 |
Asia | 1 |
Australia | 1 |
California | 1 |
China | 1 |
Connecticut | 1 |
Delaware | 1 |
Georgia | 1 |
Idaho | 1 |
More ▼ |
Laws, Policies, & Programs
Assessments and Surveys
Graduate Record Examinations | 4 |
Test of English as a Foreign… | 3 |
National Assessment of… | 2 |
Praxis Series | 1 |
SAT (College Admission Test) | 1 |
Trends in International… | 1 |
What Works Clearinghouse Rating
Hongwen Guo; Matthew S. Johnson; Daniel F. McCaffrey; Lixong Gu – ETS Research Report Series, 2024
The multistage testing (MST) design has been gaining attention and popularity in educational assessments. For testing programs that have small test-taker samples, it is challenging to calibrate new items to replenish the item pool. In the current research, we used the item pools from an operational MST program to illustrate how research studies…
Descriptors: Test Items, Test Construction, Sample Size, Scaling
Guo, Hongwen; Lu, Ru; Johnson, Matthew S.; McCaffrey, Dan F. – ETS Research Report Series, 2022
It is desirable for an educational assessment to be constructed of items that can differentiate different performance levels of test takers, and thus it is important to estimate accurately the item discrimination parameters in either classical test theory or item response theory. It is particularly challenging to do so when the sample sizes are…
Descriptors: Test Items, Item Response Theory, Item Analysis, Educational Assessment
Guo, Hongwen; Ling, Guangming; Frankel, Lois – ETS Research Report Series, 2020
With advances in technology, researchers and test developers are developing new item types to measure complex skills like problem solving and critical thinking. Analyzing such items is often challenging because of their complicated response patterns, and thus it is important to develop psychometric methods for practitioners and researchers to…
Descriptors: Test Construction, Test Items, Item Analysis, Psychometrics
Lu, Ru; Guo, Hongwen; Dorans, Neil J. – ETS Research Report Series, 2021
Two families of analysis methods can be used for differential item functioning (DIF) analysis. One family is DIF analysis based on observed scores, such as the Mantel-Haenszel (MH) and the standardized proportion-correct metric for DIF procedures; the other is analysis based on latent ability, in which the statistic is a measure of departure from…
Descriptors: Robustness (Statistics), Weighted Scores, Test Items, Item Analysis
Gu, Lixiong; Ling, Guangming; Qu, Yanxuan – ETS Research Report Series, 2019
Research has found that the "a"-stratified item selection strategy (STR) for computerized adaptive tests (CATs) may lead to insufficient use of high a items at later stages of the tests and thus to reduced measurement precision. A refined approach, unequal item selection across strata (USTR), effectively improves test precision over the…
Descriptors: Computer Assisted Testing, Adaptive Testing, Test Use, Test Items
Lopez, Alexis A.; Guzman-Orth, Danielle; Zapata-Rivera, Diego; Forsyth, Carolyn M.; Luce, Christine – ETS Research Report Series, 2021
Substantial progress has been made toward applying technology enhanced conversation-based assessments (CBAs) to measure the English-language proficiency of English learners (ELs). CBAs are conversation-based systems that use conversations among computer-animated agents and a test taker. We expanded the design and capability of prior…
Descriptors: Accuracy, English Language Learners, Language Proficiency, Language Tests
Kim, Sooyeon; Robin, Frederic – ETS Research Report Series, 2017
In this study, we examined the potential impact of item misfit on the reported scores of an admission test from the subpopulation invariance perspective. The target population of the test consisted of 3 major subgroups with different geographic regions. We used the logistic regression function to estimate item parameters of the operational items…
Descriptors: Scores, Test Items, Test Bias, International Assessment
Guo, Hongwen; Zu, Jiyun; Kyllonen, Patrick; Schmitt, Neal – ETS Research Report Series, 2016
In this report, systematic applications of statistical and psychometric methods are used to develop and evaluate scoring rules in terms of test reliability. Data collected from a situational judgment test are used to facilitate the comparison. For a well-developed item with appropriate keys (i.e., the correct answers), agreement among various…
Descriptors: Scoring, Test Reliability, Statistical Analysis, Psychometrics
Fife, James H.; James, Kofi; Peters, Stephanie – ETS Research Report Series, 2020
The concept of variability is central to statistics. In this research report, we review mathematics education research on variability and, based on that review and on feedback from an expert panel, propose a learning progression (LP) for variability. The structure of the proposed LP consists of 5 levels of sophistication in understanding…
Descriptors: Mathematics Education, Statistics Education, Feedback (Response), Research Reports
Attali, Yigal – ETS Research Report Series, 2014
Previous research on calculator use in standardized assessments of quantitative ability focused on the effect of calculator availability on item difficulty and on whether test developers can predict these effects. With the introduction of an on-screen calculator on the Quantitative Reasoning measure of the "GRE"® revised General Test, it…
Descriptors: College Entrance Examinations, Graduate Study, Calculators, Test Items
Zwick, Rebecca – ETS Research Report Series, 2012
Differential item functioning (DIF) analysis is a key component in the evaluation of the fairness and validity of educational tests. The goal of this project was to review the status of ETS DIF analysis procedures, focusing on three aspects: (a) the nature and stringency of the statistical rules used to flag items, (b) the minimum sample size…
Descriptors: Test Bias, Sample Size, Bayesian Statistics, Evaluation Methods
Qian, Xiaoyu; Nandakumar, Ratna; Glutting, Joseoph; Ford, Danielle; Fifield, Steve – ETS Research Report Series, 2017
In this study, we investigated gender and minority achievement gaps on 8th-grade science items employing a multilevel item response methodology. Both gaps were wider on physics and earth science items than on biology and chemistry items. Larger gender gaps were found on items with specific topics favoring male students than other items, for…
Descriptors: Item Analysis, Gender Differences, Achievement Gap, Grade 8
Fu, Jianbin; Wise, Maxwell – ETS Research Report Series, 2012
In the Cognitively Based Assessment of, for, and as Learning ("CBAL"™) research initiative, innovative K-12 prototype tests based on cognitive competency models are developed. This report presents the statistical results of the 2 CBAL Grade 8 writing tests and 2 Grade 7 reading tests administered to students in 20 states in spring 2011.…
Descriptors: Cognitive Ability, Grade 8, Writing Tests, Grade 7
Deane, Paul; Lawless, René R.; Li, Chen; Sabatini, John; Bejar, Isaac I.; O'Reilly, Tenaha – ETS Research Report Series, 2014
We expect that word knowledge accumulates gradually. This article draws on earlier approaches to assessing depth, but focuses on one dimension: richness of semantic knowledge. We present results from a study in which three distinct item types were developed at three levels of depth: knowledge of common usage patterns, knowledge of broad topical…
Descriptors: Vocabulary, Test Items, Language Tests, Semantics
Young, John W.; King, Teresa C.; Hauck, Maurice Cogan; Ginsburgh, Mitchell; Kotloff, Lauren; Cabrera, Julio; Cavalie, Carlos – ETS Research Report Series, 2014
This article describes two research studies conducted on the linguistic modification of test items from K-12 content assessments. In the first study, 120 linguistically modified test items in mathematics and science taken by fourth and sixth graders were found to have a wide range of outcomes for English language learners (ELLs) and non-ELLs, with…
Descriptors: English Language Learners, Test Items, Mathematics Tests, Science Tests