Publication Date
| In 2026 | 0 |
| Since 2025 | 0 |
| Since 2022 (last 5 years) | 0 |
| Since 2017 (last 10 years) | 0 |
| Since 2007 (last 20 years) | 6 |
Descriptor
| Statistical Analysis | 32 |
| Test Reliability | 32 |
| Test Results | 32 |
| Test Validity | 15 |
| Test Interpretation | 12 |
| Test Construction | 11 |
| Evaluation Methods | 7 |
| Standardized Tests | 7 |
| Criterion Referenced Tests | 6 |
| Scores | 6 |
| Difficulty Level | 5 |
| More ▼ | |
Source
Author
| Aaronson, May | 1 |
| Alonzo, Julie | 1 |
| Bayuk, Robert J. | 1 |
| Benson, Jeri | 1 |
| Berk, Ronald A. | 1 |
| Besel, Ronald | 1 |
| Blatchford, Charles H. | 1 |
| Caselli, M. Cristina | 1 |
| Charters, Moire C. | 1 |
| Crocker, Linda | 1 |
| Devescovi, Antonella | 1 |
| More ▼ | |
Publication Type
| Reports - Research | 18 |
| Journal Articles | 6 |
| Guides - General | 2 |
| Numerical/Quantitative Data | 2 |
| Reports - Evaluative | 2 |
| Speeches/Meeting Papers | 2 |
| Guides - Non-Classroom | 1 |
| Information Analyses | 1 |
| Reports - Descriptive | 1 |
Education Level
| Secondary Education | 3 |
| High Schools | 2 |
| Elementary Education | 1 |
| Elementary Secondary Education | 1 |
| Grade 2 | 1 |
| Grade 9 | 1 |
| Higher Education | 1 |
| Junior High Schools | 1 |
| Middle Schools | 1 |
| Postsecondary Education | 1 |
Audience
| Researchers | 1 |
Location
| Australia | 1 |
| California | 1 |
| Indonesia | 1 |
| Maryland | 1 |
Laws, Policies, & Programs
Assessments and Surveys
| ACT Assessment | 1 |
| Adjective Check List | 1 |
| Armed Services Vocational… | 1 |
| Mean Length of Utterance | 1 |
| Miller Analogies Test | 1 |
| National Assessment of… | 1 |
| Program for International… | 1 |
What Works Clearinghouse Rating
Powers, Sonya; Li, Dongmei; Suh, Hongwook; Harris, Deborah J. – ACT, Inc., 2016
ACT reporting categories and ACT Readiness Ranges are new features added to the ACT score reports starting in fall 2016. For each reporting category, the number correct score, the maximum points possible, the percent correct, and the ACT Readiness Range, along with an indicator of whether the reporting category score falls within the Readiness…
Descriptors: Scores, Classification, College Entrance Examinations, Error of Measurement
Rusilowati, Ani; Kurniawati, Lina; Nugroho, Sunyoto E.; Widiyatmoko, Arif – International Journal of Environmental and Science Education, 2016
The purpose of this study is to develop scientific literacy evaluation instrument that tested its validity, reliability, and characteristics to measure the skill of student's scientific literacy used four scientific literacy, categories as follow:science as a body of knowledge (category A), science as a way of thinking (category B), science as a…
Descriptors: Foreign Countries, Junior High School Students, Grade 9, Test Construction
Lai, Cheng-Fei; Irvin, P. Shawn; Alonzo, Julie; Park, Bitnara Jasmine; Tindal, Gerald – Behavioral Research and Teaching, 2012
In this technical report, we present the results of a reliability study of the second-grade multiple choice reading comprehension measures available on the easyCBM learning system conducted in the spring of 2011. Analyses include split-half reliability, alternate form reliability, person and item reliability as derived from Rasch analysis,…
Descriptors: Reading Comprehension, Testing Programs, Statistical Analysis, Elementary School Students
O'Toole, J. M.; King, R. A. R. – Language Testing, 2011
The "cloze" test is one possible investigative instrument for predicting text comprehensibility. Conceptual coding of student replacement of deleted words has been considered to be more valid than exact coding, partly because conceptual coding seemed fairer to poorer readers. This paper reports a quantitative study of 447 Australian…
Descriptors: Cloze Procedure, Test Results, Language Tests, Reading Comprehension
Lissitz, Robert W.; Hou, Xiaodong; Slater, Sharon Cadman – Journal of Applied Testing Technology, 2012
This article investigates several questions regarding the impact of different item formats on measurement characteristics. Constructed response (CR) items and multiple choice (MC) items obviously differ in their formats and in the resources needed to score them. As such, they have been the subject of considerable discussion regarding the impact of…
Descriptors: Computer Assisted Testing, Scoring, Evaluation Problems, Psychometrics
Peer reviewedDoppelt, Jerome E. – Educational and Psychological Measurement, 1971
Descriptors: Aptitude Tests, Scores, Statistical Analysis, Test Reliability
Stocking, Martha; And Others – 1973
For two tests measuring the same trait, the program, BIV20, equates the scores using the two True score distributions estimated by the univariate method 20 program (see Wingersky, Lees, Lennon, and Lord, 1969) and, with these equated true scores and their distributions, estimates the bivariate distribution scores and the relative efficiency of the…
Descriptors: Computer Programs, Equated Scores, Statistical Analysis, Test Reliability
Whaley, Donald L. – 1973
An introductory textbook on psychological tests and measurements is presented in paper back booklet form. The style is informal and humorous, and the book is intended to appeal to the contemporary student. Ten chapters constitute the text: (1) On Measurement and Existence; (2) A Brief, Imprecise History of Psychological Testing; (3) The Creation…
Descriptors: Measurement, Psychological Testing, Sampling, Statistical Analysis
Peer reviewedBerk, Ronald A. – Journal of Educational Measurement, 1980
A dozen different approaches that yield 13 reliability indices for criterion-referenced tests were identified and grouped into three categories: threshold loss function, squared-error loss function, and domain score estimation. Indices were evaluated within each category. (Author/RL)
Descriptors: Classification, Criterion Referenced Tests, Cutting Scores, Evaluation Methods
Devescovi, Antonella; Caselli, M. Cristina – International Journal of Language & Communication Disorders, 2007
e mean length of utterance in the Sentence Repetition Task grew from approximately two to three words, and the number of omissions of articles, prepositions and modifiers significantly decreased. After 3;0 years old, omissions of free function words practically disappeared. The results of Study 2 showed that mean length of utterance, omission of…
Descriptors: Test Reliability, Test Results, Statistical Analysis, Memory
Roudabush, Glenn E.; Green, Donald Ross – 1972
In determining how reliable is reliable enough and how much error can be tolerated in criterion-referenced testing, the following relationships hold: (1) the more specific an objective is, the fewer the items required to reliably measure it; (2) the more specific the objectives are, the more objectives required to cover a given span of the…
Descriptors: Behavioral Objectives, Criterion Referenced Tests, Diagnostic Tests, Statistical Analysis
Valentine, Lonnie D., Jr.; Massey, Iris H. – 1976
Male and female enlistees were compared on the basis of their performance on the Armed Services Vocational Aptitude Battery. Mean Aptitude Index scores were compared for male and female enlistees on the original testing and on retest. Males scored higher on mechanical and electronics, and females scored higher on administrative and general. Both…
Descriptors: Aptitude Tests, Attitude Measures, Enlisted Personnel, Item Analysis
Primoff, Ernest S. – 1971
This report shows how Beta weights for the J-Coefficient may be easily developed without a formal validity study, and indicates how indications of ability other than tests can be used to measure the same abilities that are measured by tests. See also TM 001 163-64,166 for further information on job elements (J-Scale) procedures. (Author/DLG)
Descriptors: Achievement Rating, Correlation, Evaluation Criteria, Occupational Tests
Bayuk, Robert J. – 1973
An investigation was conducted to determine the effects of response-category weighting and item weighting on reliability and predictive validity. Response-category weighting refers to scoring in which, for each category (including omit and "not read"), a weight is assigned that is proportional to the mean criterion score of examinees selecting…
Descriptors: Aptitude Tests, Correlation, Predictive Validity, Research Reports
Peer reviewedMellenbergh, Gideon J.; van der Linden, Wim J. – Applied Psychological Measurement, 1979
For six tests, coefficient delta as an index for internal optimality is computed. Internal optimality is defined as the magnitude of risk of the decision procedure with respect to the true score. Results are compared with an alternative index (coefficient kappa) for assessing the consistency of decisions. (Author/JKS)
Descriptors: Classification, Comparative Analysis, Decision Making, Error of Measurement

Direct link
