Publication Date
| In 2026 | 0 |
| Since 2025 | 53 |
| Since 2022 (last 5 years) | 195 |
| Since 2017 (last 10 years) | 495 |
| Since 2007 (last 20 years) | 743 |
Descriptor
| Test Items | 1187 |
| Test Reliability | 1187 |
| Test Validity | 685 |
| Test Construction | 566 |
| Foreign Countries | 349 |
| Difficulty Level | 280 |
| Item Analysis | 253 |
| Psychometrics | 234 |
| Item Response Theory | 219 |
| Factor Analysis | 183 |
| Multiple Choice Tests | 173 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 12 |
| LaVenia, Mark | 5 |
| Liu, Ou Lydia | 5 |
| Anderson, Daniel | 4 |
| Bauduin, Charity | 4 |
| DiLuzio, Geneva J. | 4 |
| Farina, Kristy | 4 |
| Haladyna, Thomas M. | 4 |
| Huck, Schuyler W. | 4 |
| Petscher, Yaacov | 4 |
| Stansfield, Charles W. | 4 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 39 |
| Researchers | 30 |
| Teachers | 24 |
| Administrators | 13 |
| Support Staff | 3 |
| Counselors | 2 |
| Students | 2 |
| Community | 1 |
| Parents | 1 |
| Policymakers | 1 |
Location
| Turkey | 69 |
| Indonesia | 37 |
| Germany | 20 |
| Canada | 17 |
| Florida | 17 |
| China | 16 |
| Australia | 15 |
| California | 12 |
| Iran | 11 |
| India | 10 |
| New York | 9 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
A Zero-One Programming Approach to Gulliksen's Matched Random Subtests Method. Research Report 86-4.
van der Linden, Wim J.; Boekkooi-Timminga, Ellen – 1986
In order to estimate the classical coefficient of test reliability, parallel measurements are needed. H. Gulliksen's matched random subtests method, which is a graphical method for splitting a test into parallel test halves, has practical relevance because it maximizes the alpha coefficient as a lower bound of the classical test reliability…
Descriptors: Algorithms, Computer Assisted Testing, Computer Software, Difficulty Level
Zimmerman, Irla L.; Woo-Sam, James M. – 1982
Two kinds of WISC-R short forms, item reduction and subtest reduction, are reviewed in terms of their ability to meet these criteria of adequacy: a significant correlation between the full scale IQ and the short form IQ, a non-significant difference between the full and short form mean IQ, a low percentage of IQ classification changes resulting…
Descriptors: Intelligence Tests, Test Interpretation, Test Items, Test Reliability
Brinzer, Raymond J. – 1979
The problem engendered by the Matching Familiar Figures (MFF) Test is one of instrument integrity (II). II is delimited by validity, reliability, and utility of MFF as a measure of the reflective-impulsive construct. Validity, reliability and utility of construct assessment may be improved by utilizing: (1) a prototypic scoring model that will…
Descriptors: Conceptual Tempo, Difficulty Level, Item Analysis, Research Methodology
Berk, Ronald A. – 1979
As alternatives to the objectives-based approach to specifying content domains for test construction purposes, six strategies are proposed: (1) amplified objectives; (2) Instructional Objectives Exchange (IOX) test specifications; (3) item transformations; (4) item forms; (5) algorithms; and (6) mapping sentences. Their effectiveness is assessed…
Descriptors: Behavioral Objectives, Comparative Analysis, Criterion Referenced Tests, Evaluation Criteria
Peer reviewedDuncan, George T.; Milton, E. O. – Psychometrika, 1978
A multiple-answer multiple-choice test is one which offers several alternate choices for each stem and any number of those choices may be considered to be correct. In this article, a class of scoring procedures called the binary class is discussed. (Author/JKS)
Descriptors: Answer Keys, Measurement Techniques, Multiple Choice Tests, Scoring Formulas
Peer reviewedWeber, Margaret B. – Educational and Psychological Measurement, 1977
Bilevel dimensionality of probability was examined via factor analysis, Rasch latent trait analysis, and classical item analysis. Results suggest that when nonstandardized measures are the criteria for achievement, relying solely on estimates of content validity may lead to erroneous interpretation of test score data. (JKS)
Descriptors: Achievement, Achievement Tests, Factor Analysis, Item Analysis
Peer reviewedChambers, David W. – Journal of Dental Education, 1988
A discussion of good test criteria reviews the basic concepts of test theory, examines four types of validity, outlines the concept of reliability and its coefficients and limitations, makes suggestions for gauging test quality, and demonstrates use of the standard error of measurement for estimating the likelihood of misgrading. (MSE)
Descriptors: Dental Schools, Higher Education, Professional Education, Statistical Analysis
Peer reviewedGarg, Rashmi; And Others – Journal of Educational Measurement, 1986
For the purpose of obtaining data to use in test development, multiple matrix sampling plans were compared to examinee sampling plans. Data were simulated for examinees, sampled from a population with a normal distribution of ability, responding to items selected from an item universe. (Author/LMO)
Descriptors: Difficulty Level, Monte Carlo Methods, Sampling, Statistical Studies
Peer reviewedGreer, Darryl – Review of Higher Education, 1984
The legislative history of legislation concerning disclosure of standardized college admissions test items is reviewed, the effect of existing laws in California and New York is outlined, and public policy and legal questions leading to and resulting from the legislation are discussed. (MSE)
Descriptors: College Entrance Examinations, Disclosure, Higher Education, Legal Problems
Fishman, Judith – Writing Program Administration, 1984
Examines the CUNY-WAT program and questions many aspects of it, especially the choice and phrasing of topics. (FL)
Descriptors: Essay Tests, Higher Education, Test Format, Test Items
Peer reviewedMisLevy, Robert J.; Bock, R. Darrell – Educational and Psychological Measurement, 1982
An alternative biweight estimator based on Tukey's is examined in which (1) test disturbances are not assumed to be the same for all subjects, (2) each response is utilized proportional to its value, and (3) the biweight and maximum likelihood estimate agree when no disturbances are present. Smaller mean-squared errors are shown. (Author/CM)
Descriptors: Error of Measurement, Estimation (Mathematics), Guessing (Tests), Latent Trait Theory
Peer reviewedSharpley, C. F.; Cross, D. G. – Journal of Marriage and the Family, 1982
Examined one instrument devised to classify respondents for research purposes into high and low marital or dyadic adjustment groups. Data indicated that, while the overall scale performs the task reliably, the majority of its 32 items are unnecessary. Factor analysis revealed that there was one underlying "adjustment" dimension. (Author)
Descriptors: Adjustment (to Environment), Factor Analysis, Foreign Countries, Marriage
Peer reviewedGross, Linda C.; Bevil, Catherine W. – Nursing Outlook, 1981
Describes the steps taken by City College School of Nursing (New York) in the development of nursing student placement tests. These steps include determining test items, use of multiple-choice questions, test revision, clinical performance tests, estimating test reliability, establishing standards, and using the tests. (CT)
Descriptors: Curriculum Development, Equivalency Tests, Higher Education, Nursing Education
Peer reviewedKarnes, Frances A.; Brown, K. Eliot – Psychology in the Schools, 1981
A study to develop a short form of the Wechsler Intelligence Scale for Children-Revised (WISC-R) for the intellectually gifted showed the Vocabulary and Block Design comprise the best two-subtest short form. The Similarities, Vocabulary, Block Design, and Object Assembly tetrad could be most useful in time and reliability. (Author)
Descriptors: Academically Gifted, Elementary Secondary Education, Intelligence Tests, Screening Tests
Peer reviewedRusch, Reuben; Steiner, Judith – Journal of Experimental Education, 1979
The Selected Marker Tests were examined for scoring problems and internal consistency and were administered orally to sixth and seventh graders. Scoring problems were discovered and changes were suggested. The problem was found to be item reliability rather than interrater reliability. (Author/MH)
Descriptors: Cognitive Tests, Elementary Education, Item Analysis, Problem Solving


