Publication Date
| In 2026 | 0 |
| Since 2025 | 52 |
| Since 2022 (last 5 years) | 194 |
| Since 2017 (last 10 years) | 494 |
| Since 2007 (last 20 years) | 742 |
Descriptor
| Test Items | 1186 |
| Test Reliability | 1186 |
| Test Validity | 684 |
| Test Construction | 565 |
| Foreign Countries | 348 |
| Difficulty Level | 279 |
| Item Analysis | 252 |
| Psychometrics | 233 |
| Item Response Theory | 219 |
| Factor Analysis | 183 |
| Multiple Choice Tests | 172 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 12 |
| LaVenia, Mark | 5 |
| Liu, Ou Lydia | 5 |
| Anderson, Daniel | 4 |
| Bauduin, Charity | 4 |
| DiLuzio, Geneva J. | 4 |
| Farina, Kristy | 4 |
| Haladyna, Thomas M. | 4 |
| Huck, Schuyler W. | 4 |
| Petscher, Yaacov | 4 |
| Stansfield, Charles W. | 4 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 39 |
| Researchers | 30 |
| Teachers | 24 |
| Administrators | 13 |
| Support Staff | 3 |
| Counselors | 2 |
| Students | 2 |
| Community | 1 |
| Parents | 1 |
| Policymakers | 1 |
Location
| Turkey | 68 |
| Indonesia | 37 |
| Germany | 20 |
| Canada | 17 |
| Florida | 17 |
| China | 16 |
| Australia | 15 |
| California | 12 |
| Iran | 11 |
| India | 10 |
| New York | 9 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Divgi, D. R. – 1978
One aim of criterion-referenced testing is to classify an examinee without reference to a norm group; therefore, any statements about the dependability of such classification ought to be group-independent also. A population-independent index is proposed in terms of the probability of incorrect classification near the cutoff true score. The…
Descriptors: Criterion Referenced Tests, Cutting Scores, Difficulty Level, Error of Measurement
Forster, Fred; Karr, Chad – 1980
A method is described which used the Rasch model for combining information across administrations of the same items in different tests to different student populations. The method was used in conjunction with the composite point biserial and mean square fit to provide more valid information about item quality. Over 700 items were administered to…
Descriptors: Achievement Tests, Elementary Secondary Education, Item Analysis, Item Banks
Peer reviewedHadar, N.; Henkin, L. – Educational Studies in Mathematics, 1978
This is the second in a sequence of three papers describing an investigation of fifth graders' ability to learn to distinguish between valid and fallacious inferences from simple conditional premises. The measuring instrument used is discussed along with several problems and difficulties encountered in the development of reliable tests in…
Descriptors: Deduction, Educational Research, Elementary Education, Grade 5
Peer reviewedHanna, Gerald S.; Bennett, Judith A. – Educational and Psychological Measurement, 1984
The presently viewed role and utility of measures of instructional sensitivity are summarized. A case is made that the rationale for the assessment of instructional sensitivity can be applied to all achievement tests and should not be restricted to criterion-referenced mastery tests. (Author/BW)
Descriptors: Achievement Tests, Context Effect, Criterion Referenced Tests, Mastery Tests
Peer reviewedBardo, John W.; Yeager, Samuel J. – Perceptual and Motor Skills, 1982
Responses to various fixed test-response formats were examined for "reliability" due to systematic error; Cronbach's alphas up to .67 were obtained. Of formats tested, four-point Likert Scales were least affected while forms of lines and faces were most problematic. Possible modification in alpha to account for systematic bias is…
Descriptors: Higher Education, Measures (Individuals), Psychometrics, Response Style (Tests)
Peer reviewedGross, Leon J. – Journal of Optometric Education, 1982
A critique of a variety of formats used in combined-response test items (those in which the respondent must choose the correct combination of options: a and b, all of the above, etc.) illustrates why this kind of testing is inherently flawed and should not be used in optometry examinations. (MSE)
Descriptors: Higher Education, Multiple Choice Tests, Optometry, Standardized Tests
Peer reviewedAiken, Lewis R. – Educational and Psychological Measurement, 1980
Procedures for computing content validity and consistency reliability coefficients and determining the statistical significance of these coefficients are described. Procedures employing the multinomial probability distribution for small samples and normal curve probability estimates for large samples, can be used where judgments are made on…
Descriptors: Computer Programs, Measurement Techniques, Probability, Questionnaires
Peer reviewedIrvin, Larry K.; And Others – Journal of Educational Measurement, 1980
The relative efficacy of content-appropriate, orally administered true/false and multiple-choice testing was examined with retarded adolescents. Both approaches demonstrated utility and psychometric adequacy. Implications regarding test development for retarded students are briefly discussed. (Author)
Descriptors: High Schools, Mild Mental Retardation, Multiple Choice Tests, Objective Tests
Peer reviewedBurton, Nancy W. – Journal of Educational Measurement, 1980
Analysis of variance methods were used to investigate the reliability of scores on open ended items in the National Assessment of Educational Progress. The study was designed to determine their stability over seven different scorers and time of scoring during a three-month interval. (Author/CTM) Aspect of National Assessment (NAEP) dealt with in…
Descriptors: Career Development, Educational Assessment, Elementary Secondary Education, Item Analysis
Peer reviewedStraton, Ralph G.; Catts, Ralph M. – Educational and Psychological Measurement, 1980
Multiple-choice tests composed entirely of two-, three-, or four-choice items were investigated. Results indicated that number of alternatives per item was inversely related to item difficulty, but directly related to item discrimination. Reliability and standard error of measurement of three-choice item tests was equivalent or superior.…
Descriptors: Difficulty Level, Error of Measurement, Foreign Countries, Higher Education
Peer reviewedGreen, Kathy – Journal of Experimental Education, 1979
Reliabilities and concurrent validities of teacher-made multiple-choice and true-false tests were compared. No significant differences were found even when multiple-choice reliability was adjusted to equate testing time. (Author/MH)
Descriptors: Comparative Testing, Higher Education, Multiple Choice Tests, Test Format
Peer reviewedStalder, Daniel R. – Teaching of Psychology, 2001
Evaluates the use of discrimination indexes (or item-total correlation) for examining the reliability of examinations. States this technique has drawbacks and may cause examination validity to be lower. Discusses the idea of discrimination power and why poor students may answer an item correctly. (CMK)
Descriptors: Academic Failure, Educational Research, Higher Education, Psychology
Peer reviewedMay, Kim; Nicewander, W. Alan – Journal of Educational Measurement, 1994
Reliabilities and information functions for percentile ranks and number-right scores were compared using item response theory, modeling standardized achievement tests. Results demonstrate that situations exist in which the percentage of items known by examinees can be accurately estimated, but the percentage of persons falling below a given score…
Descriptors: Achievement Tests, Difficulty Level, Equations (Mathematics), Estimation (Mathematics)
Peer reviewedStumpf, Steven H. – Evaluation and the Health Professions, 1994
A five-year curriculum evaluation project is described that treated students' course ratings, examination reliability coefficients, and item-discrimination data as a battery of data points for determining annual revision efforts. Histograms were constructed to make valid demonstrations of successful efforts immediately comprehensible to faculty.…
Descriptors: College Faculty, Comprehension, Curriculum Evaluation, Longitudinal Studies
Peer reviewedTruman, William L. – AMATYC Review, 1992
Describes the development of a placement test specifically designed for students entering Pembroke State University. Includes question construction and suitable test standards in discussing the features of a good placement test. Concludes that the test provides a reliable measure of students potential success. (MDH)
Descriptors: Content Validity, Higher Education, Mathematics Education, Mathematics Tests


