Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 1 |
| Since 2007 (last 20 years) | 1 |
Descriptor
| Item Analysis | 19 |
| Scoring Formulas | 19 |
| Test Items | 19 |
| Difficulty Level | 9 |
| Test Reliability | 8 |
| Guessing (Tests) | 6 |
| Higher Education | 6 |
| Multiple Choice Tests | 6 |
| Test Construction | 6 |
| Test Validity | 5 |
| Latent Trait Theory | 4 |
| More ▼ | |
Source
| Applied Psychological… | 3 |
| Educational and Psychological… | 1 |
| Evaluation in Education:… | 1 |
| Journal of Educational… | 1 |
| Language Testing | 1 |
| Review of Educational Research | 1 |
Author
| Brennan, Robert L. | 1 |
| Brinzer, Raymond J. | 1 |
| Bulut, Okan | 1 |
| Claudy, John G. | 1 |
| Dorans, Neil J. | 1 |
| Downey, Ronald G. | 1 |
| Gierl, Mark J. | 1 |
| Gilmer, Jerry S. | 1 |
| Guo, Qi | 1 |
| Jaeger, Richard M. | 1 |
| Kane, Michael | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 14 |
| Speeches/Meeting Papers | 7 |
| Journal Articles | 4 |
| Information Analyses | 1 |
Education Level
Audience
| Researchers | 2 |
Location
Laws, Policies, & Programs
Assessments and Surveys
| Graduate Record Examinations | 1 |
| Matching Familiar Figures Test | 1 |
| SAT (College Admission Test) | 1 |
| Test of English as a Foreign… | 1 |
What Works Clearinghouse Rating
Gierl, Mark J.; Bulut, Okan; Guo, Qi; Zhang, Xinxin – Review of Educational Research, 2017
Multiple-choice testing is considered one of the most effective and enduring forms of educational assessment that remains in practice today. This study presents a comprehensive review of the literature on multiple-choice testing in education focused, specifically, on the development, analysis, and use of the incorrect options, which are also…
Descriptors: Multiple Choice Tests, Difficulty Level, Accuracy, Error Patterns
Peer reviewedClaudy, John G. – Applied Psychological Measurement, 1978
Option weighting is an alternative to increasing test length as a means of improving the reliability of a test. The effects on test reliability of option weighting procedures were compared in two empirical studies using four independent sets of items. Biserial weights were found to be superior. (Author/CTM)
Descriptors: Higher Education, Item Analysis, Scoring Formulas, Test Items
Peer reviewedOltman, Phillip K.; Stricker, Lawrence J. – Language Testing, 1990
A recent multidimensional scaling analysis of the Test of English-as-a-Foreign-Language (TOEFL) item response data identified clusters of items in the test sections that, being more homogeneous than their parent sections, might be better for diagnostic use. The analysis was repeated using different scoring techniques. Results diverged only for…
Descriptors: English (Second Language), Item Analysis, Language Tests, Scaling
Brinzer, Raymond J. – 1979
The problem engendered by the Matching Familiar Figures (MFF) Test is one of instrument integrity (II). II is delimited by validity, reliability, and utility of MFF as a measure of the reflective-impulsive construct. Validity, reliability and utility of construct assessment may be improved by utilizing: (1) a prototypic scoring model that will…
Descriptors: Conceptual Tempo, Difficulty Level, Item Analysis, Research Methodology
Peer reviewedDorans, Neil J. – Journal of Educational Measurement, 1986
The analytical decomposition demonstrates how the effects of item characteristics, test properties, individual examinee responses, and rounding rules combine to produce the item deletion effect on the equating/scaling function and candidate scores. The empirical portion of the report illustrates the effects of item deletion on reported score…
Descriptors: Difficulty Level, Equated Scores, Item Analysis, Latent Trait Theory
Peer reviewedKane, Michael; Moloney, James – Applied Psychological Measurement, 1978
The answer-until-correct (AUC) procedure requires that examinees respond to a multi-choice item until they answer it correctly. Using a modified version of Horst's model for examinee behavior, this paper compares the effect of guessing on item reliability for the AUC procedure and the zero-one scoring procedure. (Author/CTM)
Descriptors: Guessing (Tests), Item Analysis, Mathematical Models, Multiple Choice Tests
Jaeger, Richard M. – 1980
Five statistical indices are developed and described which may be used for determining (1) when linear equating of two approximately parallel tests is adequate, and (2) whan a more complex method such as equipercentile equating must be used. The indices were based on: (1) similarity of cumulative score distributions; (2) shape of the raw-score to…
Descriptors: College Entrance Examinations, Difficulty Level, Equated Scores, Higher Education
Peer reviewedLord, Frederic M. – Educational and Psychological Measurement, 1971
Descriptors: Ability, Adaptive Testing, Computer Oriented Programs, Difficulty Level
Lowry, Stephen R. – 1979
A specially designed answer format was used for three tests in a college level agriculture class of 19 students to record responses to three things about each item: (1) the student's choice of the best answer; (2) the degree of certainty with which the answer was chosen; and (3) all the answer choices which the student was certain were incorrect.…
Descriptors: Achievement Tests, Confidence Testing, Guessing (Tests), Higher Education
Wood, Robert – Evaluation in Education: International Progress, 1977
The author surveys literature and practice, primarily in Great Britain and the United States, about multiple-choice testing, comments on criticisms, and defends the state of the art. Varous item types, item writing, test instructions and scoring formulas, item analysis, and test construction are discussed. An extensive bibliography is appended.…
Descriptors: Achievement Tests, Item Analysis, Multiple Choice Tests, Scoring Formulas
Lenel, Julia C.; Gilmer, Jerry S. – 1986
In some testing programs an early item analysis is performed before final scoring in order to validate the intended keys. As a result, some items which are flawed and do not discriminate well may be keyed so as to give credit to examinees no matter which answer was chosen. This is referred to as allkeying. This research examined how varying the…
Descriptors: Equated Scores, Item Analysis, Latent Trait Theory, Licensing Examinations (Professions)
Legg, Sue M. – 1982
A case study of the Florida Teacher Certification Examination (FTCE) program was described to assist others launching the development of large scale item banks. FTCE has four subtests: Mathematics, Reading, Writing, and Professional Education. Rasch calibrated item banks have been developed for all subtests except Writing. The methods used to…
Descriptors: Cutting Scores, Difficulty Level, Field Tests, Item Analysis
Sullivan, Arthur P. – 1978
Sullivan's Ethical Reasoning Scale contains three dilemmas with response pairs representing Kohlberg's stages of moral development. In Kohlberg's first three stages, goodness is equated with lack of punishment, usefulness, and approval, respectively. Good is seen as conformity to rule and ruler in stage four, and stage five comprises…
Descriptors: Adolescents, Adults, Attitude Measures, Conflict Resolution
Peer reviewedDowney, Ronald G. – Applied Psychological Measurement, 1979
This research attempted to interrelate several methods of producing option weights (i.e., Guttman internal and external weights and judges' weights) and examined their effects on reliability and on concurrent, predictive, and face validity. It was concluded that option weighting offered limited, if any, improvement over unit weighting. (Author/CTM)
Descriptors: Achievement Tests, Answer Keys, Comparative Testing, High Schools
Brennan, Robert L. – 1974
The first four chapters of this report primarily provide an extensive, critical review of the literature with regard to selected aspects of the criterion-referenced and mastery testing fields. Major topics treated include: (a) definitions, distinctions, and background, (b) the relevance of classical test theory, (c) validity and procedures for…
Descriptors: Computer Programs, Confidence Testing, Criterion Referenced Tests, Error of Measurement
Previous Page | Next Page ยป
Pages: 1 | 2
Direct link
