Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 2 |
| Since 2017 (last 10 years) | 11 |
| Since 2007 (last 20 years) | 24 |
Descriptor
| Scoring Formulas | 146 |
| Test Reliability | 146 |
| Test Validity | 66 |
| Multiple Choice Tests | 47 |
| Guessing (Tests) | 38 |
| Test Construction | 33 |
| Test Interpretation | 26 |
| Test Items | 25 |
| Higher Education | 23 |
| Scoring | 23 |
| Item Analysis | 22 |
| More ▼ | |
Source
Author
Publication Type
Education Level
| Higher Education | 7 |
| Postsecondary Education | 6 |
| Elementary Education | 2 |
| Elementary Secondary Education | 2 |
| Secondary Education | 2 |
| Adult Education | 1 |
| High Schools | 1 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
Audience
| Researchers | 2 |
| Practitioners | 1 |
Location
| New York (New York) | 2 |
| Australia | 1 |
| Canada | 1 |
| Germany | 1 |
| India | 1 |
| Malaysia | 1 |
| Minnesota | 1 |
| Mississippi | 1 |
| New York | 1 |
| North Carolina | 1 |
| Ohio | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Elementary and Secondary… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Tsujimoto, Richard N.; Berger, Dale E. – Child Abuse and Neglect: The International Journal, 1988
Two criteria are discussed for determining cutting scores on a predictor variable for identifying cases of likely child abuse--utility maximizing and error minimizing. Utility maximizing is the preferable criterion, as it optimizes the balance between the costs of incorrect decisions and the benefits of correct decisions. (Author/JDD)
Descriptors: Child Abuse, Cost Effectiveness, Cutting Scores, Error of Measurement
Peer reviewedStocker, Leonard P. – Reading Improvement, 1971
Suggests approximately 200 words of religious denotation that should be added to the Dale-Chall List of 3,000 Familiar Words when assessing the readability of materials that contain a Catholic vocabulary." (VJ)
Descriptors: Catholic Schools, Readability, Reading Level, Reading Materials
Peer reviewedBirenbaum, Menucha; Fatsuoka, Kikumi K. – Journal of Educational Measurement, 1983
The outcomes of two scoring methods (one based on an error analysis and the second on a conventional method) on free-response tests, compared in terms of reliability and dimensionality, indicates the conventional method is inferior in both aspects. (Author/PN)
Descriptors: Achievement Tests, Algorithms, Data, Junior High Schools
Borman, Walter C.; Rosse, Rodney L. – 1980
As an alternative for or adjunct to paper-and-pencil tests for predicting personnel performance, the United States Air Force studied the use of peer ratings as an evaluative tool. Purpose of this study was to evaluate the psychometric characteristics of peer ratings among Air Force basic trainees. Peer ratings were obtained from more than 27,000…
Descriptors: Military Personnel, Peer Evaluation, Personnel Evaluation, Personnel Selection
Hambleton, Ronald K.; Novick, Melvin R. – 1972
In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity…
Descriptors: Criterion Referenced Tests, Measurement Instruments, Measurement Techniques, Scaling
Koehler, Roger A. – 1974
A potentially valuable measure of overconfidence on probabilistic multiple-choice tests was evaluated. The measure of overconfidence was based on probabilistic responses to nonsense items embedded in a vocabulary test. The test was administered under both confidence response and conventional choice response directions to 208 undergraduate…
Descriptors: Confidence Testing, Guessing (Tests), Measurement Techniques, Multiple Choice Tests
Peer reviewedFrederiksen, Norman; Ward, William C. – Applied Psychological Measurement, 1978
A set of Tests of Scientific Thinking were developed for possible use as criterion measures in research on creativity. Scores on the tests describe both quality and quantity of ideas produced in formulating hypotheses, evaluating proposals, solving methodological problems, and devising methods for measuring constructs. (Author/CTM)
Descriptors: Creativity Tests, Higher Education, Item Sampling, Predictive Validity
Peer reviewedHsu, Tse-Chi; And Others – Journal of Experimental Education, 1984
The indices of item difficulty and discrimination, the coefficients of effective length, and the average item information for both single- and multiple-answer items using six different scoring formulas were computed and compared. These formulas vary in terms of the assignment of partial credit and the correction for guessing. (Author/BW)
Descriptors: College Entrance Examinations, Comparative Analysis, Difficulty Level, Guessing (Tests)
Peer reviewedTraub, Ross E.; Hambleton, Ronald K. – Educational and Psychological Measurement, 1972
Findings of this study suggest that it is preferable to attempt to control guessing through the use of the reward instruction rather than to attempt to control it using the penalty instruction or to encourage it using the insttruction to guess. (Authors/MB)
Descriptors: Grade 8, Guessing (Tests), Multiple Choice Tests, Pacing
Peer reviewedZughoul, Muhammad R.; Kambal, M. Osman – International Review of Applied Linguistics in Language Teaching, 1983
Based on the responses of 50 ESL instructors to a composition-scoring exercise, a detailed method of scoring compositions was developed that divides the writing into basic components (structure, content, vocabulary, organization, and mechanics) and provides a scoring mechanism for each component for each of three competency levels. (MSE)
Descriptors: English (Second Language), Evaluation Criteria, Evaluation Methods, Measurement Techniques
Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking
Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…
Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items
Saunders, Joseph C.; Huynh, Huynh – 1980
In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…
Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests
Bruno, James E. – Journal of Computer-Based Instruction, 1987
Reports preliminary findings of a study which used a modified Admissible Probability Measurement (APM) test scoring system in the design of computer based instructional management systems. The use of APM for curriculum analysis is discussed, as well as its value in enhancing individualized learning. (Author/LRW)
Descriptors: Computer Assisted Testing, Computer Managed Instruction, Curriculum Evaluation, Design
Peer reviewedAngoff, William H.; Schrader, William B. – Journal of Educational Measurement, 1984
The reported data provide a basis for evaluating the formula-scoring versus rights-scoring issue and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.…
Descriptors: College Entrance Examinations, Guessing (Tests), Higher Education, Hypothesis Testing
Foegen, Anne – Diagnostique, 2000
A study involving 105 sixth-graders examined three aspects of technical adequacy with respect to two general outcome measures in mathematics: the effects of aggregating scores and correcting for random guessing on reliability and validity and the extent to which the measures were sensitive to changes in performance. (Contains references.)…
Descriptors: Curriculum Based Assessment, Disabilities, Grade 6, Mathematics

Direct link
