ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	0
Since 2017 (last 10 years)	1
Since 2007 (last 20 years)	1

Descriptor

Test Reliability	11
Scores	5
Latent Trait Theory	4
Test Items	4
Item Response Theory	3
College Entrance Examinations	2
Correlation	2
Cutting Scores	2
Error of Measurement	2
Item Analysis	2
Item Bias	2
Psychometrics	2
State Programs	2
Test Construction	2
Test Format	2
Test Interpretation	2
Test Theory	2
Test Validity	2
Testing	2
Beginning Teachers	1
Classification	1
Cognitive Ability	1
College Applicants	1
Computer Assisted Testing	1
Constructed Response	1
More ▼

Source

Applied Measurement in…	2
Education Policy Analysis…	1
Educational and Psychological…	1
Journal of College Admissions	1
Journal of Educational…	1
Journal of Educational…	1
Journal of Educational and…	1

Author

Wainer, Howard	11
Grabovsky, Irina	1
Holland, Paul W.	1
Lukhele, Robert	1
Morgan, Anne	1
Thissen, David	1

Publication Type

Journal Articles	8
Reports - Research	5
Reports - Evaluative	3
Book/Product Reviews	1
Information Analyses	1
Opinion Papers	1
Speeches/Meeting Papers	1

Education Level

Audience

Researchers

Location

Massachusetts

Laws, Policies, & Programs

Assessments and Surveys

SAT (College Admission Test)	2
Test of English as a Foreign…	1

What Works Clearinghouse Rating

Showing all 11 results Save | Export

A Guide for Setting the Cut-Scores to Minimize Weighted Classification Errors in Test Batteries

Peer reviewed

Direct link

Grabovsky, Irina; Wainer, Howard – Journal of Educational and Behavioral Statistics, 2017

In this article, we extend the methodology of the Cut-Score Operating Function that we introduced previously and apply it to a testing scenario with multiple independent components and different testing policies. We derive analytically the overall classification error rate for a test battery under the policy when several retakes are allowed for…

Descriptors: Cutting Scores, Weighted Scores, Classification, Testing

Can a Test be too Reliable?

Peer reviewed

Wainer, Howard – Journal of Educational Measurement, 1986

An example demonstrates and explains that summary statistics commonly used to measure test quality can be seriously misleading and that summary statistics for the whole test are not sufficient for judging the quality of the test. (Author/LMO)

Descriptors: Correlation, Item Analysis, Statistical Bias, Statistical Studies

Testing and Test Theory: Whither and Whence.

Download full text

Wainer, Howard – 1982

This paper is the transcript of a talk given to those who use test information but who have little technical background in test theory. The concepts of modern test theory are compared with traditional test theory, as well as a probable future test theory. The explanations given are couched within an extended metaphor that allows a full description…

Descriptors: Difficulty Level, Latent Trait Theory, Metaphors, Test Items

How Reliable Are TOEFL Scores?

Peer reviewed

Wainer, Howard; Lukhele, Robert – Educational and Psychological Measurement, 1997

The reliability of scores from four forms of the Test of English as a Foreign Language (TOEFL) was estimated using a hybrid item response theory model. It was found that there was very little difference between overall reliability when the testlet items were assumed to be independent and when their dependence was modeled. (Author/SLD)

Descriptors: English (Second Language), Item Response Theory, Scores, Second Language Learning

On the Study of Matching Cut-Scores to Test Characteristics: An Observed Score Approach. Program Statistics Research Technical Report Series.

Wainer, Howard – 1985

Techniques derived from item response theory are useful for estimating the reliability of test classification above and below the cutting score. Test developers can construct a test whose information is peaked in the region of the cutting score; users can select a test which provides the most information in this region. The Cut-Score…

Descriptors: Cutting Scores, Item Analysis, Latent Trait Theory, Mastery Tests

DIFferential Testlet Functioning Definitions and Detection. Program Statistics Research Technical Report No. 91-9.

Download full text

Wainer, Howard; And Others – 1991

It is sometimes sensible to think of the fundamental unit of test construction as being larger than an individual item. This unit, dubbed the testlet, must pass muster in the same way that items do. One criterion of a good item is the absence of differential item functioning (DIF). The item must function in the same way as all important…

Descriptors: Definitions, Identification, Item Bias, Item Response Theory

A Review of Estimation Procedures for the Rasch Model with an Eye toward Longish Tests.

Peer reviewed

Morgan, Anne; Wainer, Howard – Journal of Educational Statistics, 1980

Two estimation procedures for the Rasch Model of test analysis are reviewed in detail, particularly with respect to new developments that make the more statistically rigorous conditional maximum likelihood estimation practical for use with longish tests. (Author/JKS)

Descriptors: Error of Measurement, Latent Trait Theory, Maximum Likelihood Statistics, Psychometrics

Some Comments on the Ad Hoc Committee's Critique of the Massachusetts Teacher Tests.

Peer reviewed

Wainer, Howard – Education Policy Analysis Archives, 1999

The critique of the Massachusetts Teacher Tests by W. Haney and others points out some flaws in the tests but ignores the fact that the tests provide some useful information to guide teacher selection decisions. Calls for additional study of these teacher evaluation instruments. (SLD)

Descriptors: Beginning Teachers, Elementary Secondary Education, State Programs, Teacher Evaluation

Combining Multiple-Choice and Constructed-Response Test Scores: Toward a Marxist Theory of Test Construction.

Peer reviewed

Wainer, Howard; Thissen, David – Applied Measurement in Education, 1993

Because assessment instruments of the future may well be composed of a combination of types of questions, a way to combine those scores effectively is discussed. Two new graphic tools are presented that show that it may not be practical to equalize the reliability of different components. (SLD)

Descriptors: Constructed Response, Educational Assessment, Graphs, Item Response Theory

Sources of Uncertainty Often Ignored in Adjusting State Mean SAT Scores for Differential Participation Rates: The Rules of the Game.

Peer reviewed

Holland, Paul W.; Wainer, Howard – Applied Measurement in Education, 1990

Two attempts to adjust state mean Scholastic Aptitude Test (SAT) scores for differential participation rates are examined. Both attempts are rejected, and five rules for performing adjustments are outlined to foster follow-up checks on untested assumptions. National Assessment of Educational Progress state data are determined to be more accurate.…

Descriptors: College Applicants, College Entrance Examinations, Estimation (Mathematics), Item Bias

On Item Response Theory and Computerized Adaptive Tests.

Peer reviewed

Wainer, Howard – Journal of College Admissions, 1983

Discusses changes in testing as a result of the availability of extensive inexpensive computing and some recent developments in statistical test theory. Describes the role of the Computerized Adaptive Test (CAT) and modern Item Response Theory (IRT) in ability testing tailored to each student's knowledge and ability. (JAC)

Descriptors: Cognitive Ability, College Entrance Examinations, Computer Assisted Testing, Higher Education