ERIC - Search Results

Publication Date

In 2026	0
Since 2025	0
Since 2022 (last 5 years)	2
Since 2017 (last 10 years)	11
Since 2007 (last 20 years)	24

Descriptor

Scoring Formulas	146
Test Reliability	146
Test Validity	66
Multiple Choice Tests	47
Guessing (Tests)	38
Test Construction	33
Test Interpretation	26
Test Items	25
Higher Education	23
Scoring	23
Item Analysis	22
Response Style (Tests)	22
Measurement Techniques	21
Weighted Scores	19
Testing Problems	18
Statistical Analysis	16
Testing	15
Correlation	14
Comparative Analysis	12
Confidence Testing	12
Evaluation Methods	12
Scores	12
Achievement Tests	11
True Scores	11
Factor Analysis	10
More ▼

Publication Type

Reports - Research	72
Journal Articles	48
Speeches/Meeting Papers	14
Reports - Evaluative	10
Tests/Questionnaires	7
Reports - Descriptive	6
Guides - Non-Classroom	4
Guides - Classroom - Teacher	2
Information Analyses	2
Opinion Papers	2
Collected Works - General	1
Guides - General	1
Numerical/Quantitative Data	1
More ▼

Education Level

Higher Education	7
Postsecondary Education	6
Elementary Education	2
Elementary Secondary Education	2
Secondary Education	2
Adult Education	1
High Schools	1
Junior High Schools	1
Middle Schools	1

Audience

Researchers	2
Practitioners	1

Location

New York (New York)	2
Australia	1
Canada	1
Germany	1
India	1
Malaysia	1
Minnesota	1
Mississippi	1
New York	1
North Carolina	1
Ohio	1
Turkey	1
United Kingdom	1
United States	1
More ▼

Laws, Policies, & Programs

Elementary and Secondary…

Assessments and Surveys

Graduate Record Examinations	3
Bender Gestalt Test	2
California Achievement Tests	2
Group Embedded Figures Test	2
Rod and Frame Test	2
SAT (College Admission Test)	2
Comprehensive Tests of Basic…	1
General Aptitude Test Battery	1
Graduate Management Admission…	1
Learning Style Inventory	1
Matching Familiar Figures Test	1
Preliminary Scholastic…	1
Rosenberg Self Esteem Scale	1
Strong Vocational Interest…	1
Test of English as a Foreign…	1
Wechsler Intelligence Scale…	1
Woodcock Reading Mastery Test	1
More ▼

What Works Clearinghouse Rating

Test Reliability X

Showing 76 to 90 of 146 results Save | Export

Predicting/Preventing Child Abuse: Value of Utility Maximizing Cutting Scores.

Tsujimoto, Richard N.; Berger, Dale E. – Child Abuse and Neglect: The International Journal, 1988

Two criteria are discussed for determining cutting scores on a predictor variable for identifying cases of likely child abuse--utility maximizing and error minimizing. Utility maximizing is the preferable criterion, as it optimizes the balance between the costs of incorrect decisions and the benefits of correct decisions. (Author/JDD)

Descriptors: Child Abuse, Cost Effectiveness, Cutting Scores, Error of Measurement

Increasing the Precision of the Dale-Chall Readability Formula

Peer reviewed

Stocker, Leonard P. – Reading Improvement, 1971

Suggests approximately 200 words of religious denotation that should be added to the Dale-Chall List of 3,000 Familiar Words when assessing the readability of materials that contain a Catholic vocabulary." (VJ)

Descriptors: Catholic Schools, Readability, Reading Level, Reading Materials

The Effect of a Scoring System Based on the Algorithm Underlying the Students' Response Patterns on the Dimensionality of Achievement Test Data of the Problem Solving Type.

Peer reviewed

Birenbaum, Menucha; Fatsuoka, Kikumi K. – Journal of Educational Measurement, 1983

The outcomes of two scoring methods (one based on an error analysis and the second on a conventional method) on free-response tests, compared in terms of reliability and dimensionality, indicates the conventional method is inferior in both aspects. (Author/PN)

Descriptors: Achievement Tests, Algorithms, Data, Junior High Schools

Peer Ratings: Scoring Strategy Development and Reliability Demonstration on Air Force Basic Trainees. Final Report.

Download full text

Borman, Walter C.; Rosse, Rodney L. – 1980

As an alternative for or adjunct to paper-and-pencil tests for predicting personnel performance, the United States Air Force studied the use of peer ratings as an evaluative tool. Purpose of this study was to evaluate the psychometric characteristics of peer ratings among Air Force basic trainees. Peer ratings were obtained from more than 27,000…

Descriptors: Military Personnel, Peer Evaluation, Personnel Evaluation, Personnel Selection

Toward an Integration of Theory and Method for Criterion-Referenced Tests.

Download full text

Hambleton, Ronald K.; Novick, Melvin R. – 1972

In this paper, an attempt has been made to synthesize some of the current thinking in the area of criterion-referenced testing as well as to provide the beginning of an integration of theory and method for such testing. Since criterion-referenced testing is viewed from a decision-theoretic point of view, approaches to reliability and validity…

Descriptors: Criterion Referenced Tests, Measurement Instruments, Measurement Techniques, Scaling

Over Confidence on Probabilistic Tests.

Download full text

Koehler, Roger A. – 1974

A potentially valuable measure of overconfidence on probabilistic multiple-choice tests was evaluated. The measure of overconfidence was based on probabilistic responses to nonsense items embedded in a vocabulary test. The test was administered under both confidence response and conventional choice response directions to 208 undergraduate…

Descriptors: Confidence Testing, Guessing (Tests), Measurement Techniques, Multiple Choice Tests

Measures for the Study of Creativity in Scientific Problem-Solving

Peer reviewed

Frederiksen, Norman; Ward, William C. – Applied Psychological Measurement, 1978

A set of Tests of Scientific Thinking were developed for possible use as criterion measures in research on creativity. Scores on the tests describe both quality and quantity of ideas produced in formulating hypotheses, evaluating proposals, solving methodological problems, and devising methods for measuring constructs. (Author/CTM)

Descriptors: Creativity Tests, Higher Education, Item Sampling, Predictive Validity

The Merits of Multiple-Answer Items as Evaluated by Using Six Scoring Formulas.

Peer reviewed

Hsu, Tse-Chi; And Others – Journal of Experimental Education, 1984

The indices of item difficulty and discrimination, the coefficients of effective length, and the average item information for both single- and multiple-answer items using six different scoring formulas were computed and compared. These formulas vary in terms of the assignment of partial credit and the correction for guessing. (Author/BW)

Descriptors: College Entrance Examinations, Comparative Analysis, Difficulty Level, Guessing (Tests)

The Effect of Scoring Instructions and Degree of Speededness on the Validity and Reliability of Multiple-Choice Tests

Peer reviewed

Traub, Ross E.; Hambleton, Ronald K. – Educational and Psychological Measurement, 1972

Findings of this study suggest that it is preferable to attempt to control guessing through the use of the reward instruction rather than to attempt to control it using the penalty instruction or to encourage it using the insttruction to guess. (Authors/MB)

Descriptors: Grade 8, Guessing (Tests), Multiple Choice Tests, Pacing

Objective Evaluation of EFL Composition.

Peer reviewed

Zughoul, Muhammad R.; Kambal, M. Osman – International Review of Applied Linguistics in Language Teaching, 1983

Based on the responses of 50 ESL instructors to a composition-scoring exercise, a detailed method of scoring compositions was developed that divides the writing into basic components (structure, content, vocabulary, organization, and mechanics) and provides a scoring mechanism for each component for each of three competency levels. (MSE)

Descriptors: English (Second Language), Evaluation Criteria, Evaluation Methods, Measurement Techniques

Multiple Choice and True/False Tests: Reliability Measures and Some Implications of Negative Marking

Peer reviewed

Direct link

Burton, Richard F. – Assessment & Evaluation in Higher Education, 2004

The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this…

Descriptors: Multiple Choice Tests, Error of Measurement, Test Reliability, Test Items

Consideration for Sample Size in Reliability Studies for Mastery Tests. Publication Series in Mastery Testing.

Download full text

Saunders, Joseph C.; Huynh, Huynh – 1980

In most reliability studies, the precision of a reliability estimate varies inversely with the number of examinees (sample size). Thus, to achieve a given level of accuracy, some minimum sample size is required. An approximation for this minimum size may be made if some reasonable assumptions regarding the mean and standard deviation of the test…

Descriptors: Cutting Scores, Difficulty Level, Error of Measurement, Mastery Tests

Admissible Probability Measures in Instructional Management.

Bruno, James E. – Journal of Computer-Based Instruction, 1987

Reports preliminary findings of a study which used a modified Admissible Probability Measurement (APM) test scoring system in the design of computer based instructional management systems. The use of APM for curriculum analysis is discussed, as well as its value in enhancing individualized learning. (Author/LRW)

Descriptors: Computer Assisted Testing, Computer Managed Instruction, Curriculum Evaluation, Design

A Study of Hypotheses Basic to the Use of Rights and Formula Scores.

Peer reviewed

Angoff, William H.; Schrader, William B. – Journal of Educational Measurement, 1984

The reported data provide a basis for evaluating the formula-scoring versus rights-scoring issue and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.…

Descriptors: College Entrance Examinations, Guessing (Tests), Higher Education, Hypothesis Testing

Technical Adequacy of General Outcome Measures for Middle School Mathematics.

Foegen, Anne – Diagnostique, 2000

A study involving 105 sixth-graders examined three aspects of technical adequacy with respect to two general outcome measures in mathematics: the effects of aggregating scores and correcting for random guessing on reliability and validity and the extent to which the measures were sensitive to changes in performance. (Contains references.)…

Descriptors: Curriculum Based Assessment, Disabilities, Grade 6, Mathematics

« Previous Page | Next Page »

Pages: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Educational and Psychological…	18
Journal of Educational…	10
Applied Psychological…	6
Journal of Experimental…	3
Psychology in the Schools	3
Psychometrika	3
ETS Research Report Series	2
Educational Leadership	2
Evaluation and the Health…	2
Journal of Computer-Based…	2
Journal of Educational…	2
Accounting Education	1
Advances in Health Sciences…	1
American Educational Research…	1
Anatomical Sciences Education	1
Assessment & Evaluation in…	1
Assessment in Education:…	1
Child Abuse and Neglect: The…	1
College Board	1
Creativity Research Journal	1
Diagnostique	1
Educational Assessment	1
Educational Sciences: Theory…	1
English Language Teaching	1
Higher Education: The…	1
More ▼

Weiss, David J.	5
Echternacht, Gary	4
Frary, Robert B.	3
Hambleton, Ronald K.	3
Rippey, Robert M.	3
Albanese, Mark A.	2
Bejar, Issac I.	2
Cross, Lawrence H.	2
Frederiksen, Norman	2
Hakstian, A. Ralph	2
Huynh, Huynh	2
Kane, Michael T.	2
Kansup, Wanlop	2
Larkin, Kevin C.	2
Moloney, James M.	2
Reilly, Richard R.	2
Traub, Ross E.	2
Ward, William C.	2
Wilcox, Rand R.	2
Abedi, Jamal	1
Abramson, Paul R.	1
Abu-Sayf, F. K.	1
Acar, Selcuk	1
Aghbar, Ali-Asghar	1
More ▼