Publication Date
| In 2026 | 0 |
| Since 2025 | 55 |
| Since 2022 (last 5 years) | 197 |
| Since 2017 (last 10 years) | 497 |
| Since 2007 (last 20 years) | 745 |
Descriptor
| Test Items | 1189 |
| Test Reliability | 1189 |
| Test Validity | 687 |
| Test Construction | 567 |
| Foreign Countries | 349 |
| Difficulty Level | 280 |
| Item Analysis | 253 |
| Psychometrics | 236 |
| Item Response Theory | 219 |
| Factor Analysis | 184 |
| Multiple Choice Tests | 173 |
| More ▼ | |
Source
Author
| Schoen, Robert C. | 12 |
| LaVenia, Mark | 5 |
| Liu, Ou Lydia | 5 |
| Anderson, Daniel | 4 |
| Bauduin, Charity | 4 |
| DiLuzio, Geneva J. | 4 |
| Farina, Kristy | 4 |
| Haladyna, Thomas M. | 4 |
| Huck, Schuyler W. | 4 |
| Petscher, Yaacov | 4 |
| Stansfield, Charles W. | 4 |
| More ▼ | |
Publication Type
Education Level
Audience
| Practitioners | 39 |
| Researchers | 30 |
| Teachers | 24 |
| Administrators | 13 |
| Support Staff | 3 |
| Counselors | 2 |
| Students | 2 |
| Community | 1 |
| Parents | 1 |
| Policymakers | 1 |
Location
| Turkey | 69 |
| Indonesia | 37 |
| Germany | 20 |
| Canada | 17 |
| Florida | 17 |
| China | 16 |
| Australia | 15 |
| California | 12 |
| Iran | 11 |
| India | 10 |
| New York | 9 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Meets WWC Standards without Reservations | 1 |
| Meets WWC Standards with or without Reservations | 1 |
Peer reviewedShaffer, Phyllis; And Others – Educational and Psychological Measurement, 1978
For a sample of 270 students who sought on their own initiative counseling on academic, career, vocational, and personal matters at a university counseling and testing center, intercorrelations of their responses to 20 statements on the Evaluation of Counselors Scale were factor-analyzed. Tables of results are presented and discussed. (Author/JKS)
Descriptors: Correlation, Counselor Evaluation, Factor Analysis, Higher Education
Peer reviewedPoizner, Sharon B.; And Others – Applied Psychological Measurement, 1978
Binary, probability, and ordinal scoring procedures for multiple-choice items were examined. In two situations, it was found that both the probability and ordinal scoring systems were more reliable than the binary scoring method. (Author/CTM)
Descriptors: Confidence Testing, Guessing (Tests), Higher Education, Multiple Choice Tests
Peer reviewedHuck, Schuyler W. – Journal of Educational Measurement, 1978
Providing examinees with advanced knowledge of the difficulty of an item led to an increase in test performance with no loss of reliability. This finding was consistent across several test formats. ( Author/JKS)
Descriptors: Difficulty Level, Feedback, Higher Education, Item Analysis
Peer reviewedShively, Michael Jay – Journal of Veterinary Medical Education, 1978
Some of the merits and pitfalls of multiple choice examinations are outlined and ways of increasing reliability and feedback information are summarized. Included are discussions of basic format, examples of poor design, examples of augmentation, and feedback from computerized grading. (LBH)
Descriptors: Feedback, Grading, Higher Education, Instructional Improvement
Peer reviewedAlbanese, Mark A.; Sabers, Darrell L. – Journal of Educational Measurement, 1988
Intercorrelations among multiple true-false items were examined to determine to what extent each choice can be treated as independent. Results from 157 health science and 170 medical students indicated that correlations between options from the same stem were larger than those from different stems. Methods for computing reliability estimates were…
Descriptors: College Students, Estimation (Mathematics), Health Personnel, Item Analysis
Peer reviewedBracken, Bruce A. – Journal of School Psychology, 1988
Notes that significantly different results frequently exist between tests that purport to measure the same skill when the same child is tested on both instruments. Considers discrepancies related to examinee, examiner, examinee-examiner interactions, environment, and psychometric characteristics of the tests employed. Cites 10 major psychometric…
Descriptors: Educational Diagnosis, Individual Differences, Psychological Evaluation, Psychological Testing
Peer reviewedWeiten, Wayne – Journal of Experimental Education, 1984
The effects of violating four item construction principles were examined to assess the validity of the principles and the importance of students' test wiseness. While flawed items were significantly less difficult than sound items, differences in item discrimination, test reliability, and concurrent validity were not observed. (Author/BW)
Descriptors: Difficulty Level, Higher Education, Item Analysis, Multiple Choice Tests
Peer reviewedHaymes, Michael; Green, Logan – Journal of Research in Personality, 1982
Reports progress in the development of the Needsort, a research tool, for the assessment of the three developmentally earliest, within Maslow's framework, conative needs (physiological, safety, belongingness). Discusses item analyses, item selection methods, reliability studies, and validation studies across a broad range of populations. (Author)
Descriptors: Child Development, Childhood Needs, Individual Needs, Measures (Individuals)
Peer reviewedPiper, Ann – British Journal of Language Teaching, 1983
In comparison with the C-test, the cloze procedure is recommended for inclusion in a student placement test battery, and as an item of greater flexibility and with application to a wider range of placement test situations. The features considered were validity, reliability, scorability, economy, administrability, practicality, and discrimination.…
Descriptors: Cloze Procedure, Comparative Analysis, Evaluation Criteria, Second Language Instruction
Peer reviewedAskegaard, Lewis D.; Umila, Benwardo V. – Journal of Educational Measurement, 1982
Multiple matrix sampling of items and examinees was applied to an 18-item rank order instrument administered to a randomly assigned group and compared to the ordering and ranking of all items by control subjects. High correlations between ranks suggest the methodology may viably reduce respondent effort on long rank ordering tasks. (Author/CM)
Descriptors: Evaluation Methods, Item Sampling, Junior High Schools, Student Reaction
Peer reviewedForsyth, Robert A.; Spratt, Kevin F. – Journal of Educational Measurement, 1980
The effects of two item formats on item difficulty and item discrimination indices for mathematics problem solving multiple-choice tests were investigated. One format required identifying the proper "set-up" for the item; the other format required complete solving of the item. (Author/JKS)
Descriptors: Difficulty Level, Junior High Schools, Multiple Choice Tests, Problem Solving
Peer reviewedGermann, Paul J. – Journal of Research in Science Teaching, 1989
Describes a paper-and-pencil test for high school biology students measuring science process skills, such as developing hypotheses; making predictions; identifying assumptions; analyzing data; and formulating conclusions. Reports some data on reliability and validity of the test. Provides all 35 items of the test. (YP)
Descriptors: Biology, Science Materials, Science Tests, Secondary Education
Peer reviewedFishkin, Anne S.; And Others – Roeper Review, 1996
This study investigated patterns of Wechsler Intelligence Scale for Children (WISC) Third Edition subtest scores for 42 gifted children in grades 4-8. Variability from subtest means was highest on Similarities, Comprehension, Coding, and Symbol Search subtests. Significant weaknesses were found on the Block Design subtest, seen as a peak subtest…
Descriptors: Ability Identification, Cluster Analysis, Elementary Secondary Education, Gifted
Peer reviewedFeldt, Leonard S. – Applied Measurement in Education, 1993
The recommendation that the reliability of multiple-choice tests will be enhanced if the distribution of item difficulties is concentrated at approximately 0.50 is reinforced and extended in this article by viewing the 0/1 item scoring as a dichotomization of an underlying normally distributed ability score. (SLD)
Descriptors: Ability, Difficulty Level, Guessing (Tests), Mathematical Models
Burton, Richard F. – Assessment and Evaluation in Higher Education, 2005
Examiners seeking guidance on multiple-choice and true/false tests are likely to encounter various faulty or questionable ideas. Twelve of these are discussed in detail, having to do mainly with the effects on test reliability of test length, guessing and scoring method (i.e. number-right scoring or negative marking). Some misunderstandings could…
Descriptors: Guessing (Tests), Multiple Choice Tests, Objective Tests, Test Reliability

Direct link
