Publication Date
| In 2026 | 0 |
| Since 2025 | 8 |
| Since 2022 (last 5 years) | 16 |
| Since 2017 (last 10 years) | 31 |
| Since 2007 (last 20 years) | 48 |
Descriptor
| Test Length | 113 |
| Test Validity | 113 |
| Test Reliability | 63 |
| Test Construction | 47 |
| Test Items | 32 |
| Test Format | 23 |
| Foreign Countries | 20 |
| Computer Assisted Testing | 18 |
| Testing Problems | 17 |
| Psychometrics | 15 |
| Factor Structure | 14 |
| More ▼ | |
Source
Author
Publication Type
Education Level
Audience
| Researchers | 5 |
| Practitioners | 2 |
| Community | 1 |
| Support Staff | 1 |
Location
| Turkey | 5 |
| China | 3 |
| United Kingdom | 3 |
| Japan | 2 |
| California | 1 |
| Canada | 1 |
| Germany | 1 |
| Italy | 1 |
| Kenya | 1 |
| Michigan | 1 |
| New Jersey | 1 |
| More ▼ | |
Laws, Policies, & Programs
| Job Training Partnership Act… | 1 |
Assessments and Surveys
What Works Clearinghouse Rating
Freedman, Sarah Warshauer – 1991
Writing teachers and educators can add to information from large-scale testing and teachers can strengthen classroom assessment by creating a tight fit between large-scale testing and classroom assessment. Across the years, large-scale testing programs have struggled with a difficult problem: how to evaluate student writing reliably and…
Descriptors: Elementary Secondary Education, Foreign Countries, Informal Assessment, Portfolios (Background Materials)
Robertson, David W.; And Others – 1977
A comparative study of item analysis was conducted on the basis of race to determine whether alternative test construction or processing might increase the proportion of black enlisted personnel among those passing various military technical knowledge examinations. The study used data from six specialists at four grade levels and investigated item…
Descriptors: Difficulty Level, Enlisted Personnel, Item Analysis, Occupational Tests
Woelfel, John C.; And Others – 1976
To measure the sex role attitudes of Army personnel, an initial set of 174 items was developed. These items were administered to 721 soldiers at three Army installations; the sample consisted of 540 men and 181 women--401 of these were officers and 320 were enlisted personnel. Factor analysis of these 174 items indicated one strong…
Descriptors: Adults, Attitude Measures, Factor Structure, Females
Steele, Joe M. – 1979
The College Outcome Measures Project/American College Testing Program (COMP/ACT) Writing Assessment is described, and issues of validity and reliability in the assessment of writing samples using qualitative rating scales are explored. COMP/ACT is composed of three role-playing tasks in the social sciences, natural sciences, and arts, which are…
Descriptors: Adults, Essay Tests, Evaluators, Higher Education
Hambleton, Ronald K. – 1986
The problem of determining optimal test lengths with fixed total testing time has proved to be a difficult one for criterion-referenced test developers. An algorithm is needed which can be used by test developers to allocate available testing time to maximize the validity of their total criterion-referenced tests or testing programs. To be…
Descriptors: Algorithms, Criterion Referenced Tests, Elementary Secondary Education, Psychometrics
Kingsbury, G. Gage; Weiss, David J. – 1981
Conventional mastery tests designed to make optimal mastery classifications were compared with fixed-length and variable-length adaptive mastery tests. Comparisons between the testing procedures were made across five content areas in an introductory biology course from tests administered to volunteers. The criterion was the student's standing in…
Descriptors: Achievement Tests, Adaptive Testing, Biology, Comparative Analysis
Peer reviewedLinn, Robert L.; Hambleton, Ronald K. – Applied Measurement in Education, 1991
Four main approaches to customized testing are described, and their resulting scores' valid uses and interpretations are discussed. Customized testing can yield valid normative and curriculum-specific information, although cautious application is needed to avoid misleading inferences about student achievement. (SLD)
Descriptors: Academic Achievement, Accountability, Criterion Referenced Tests, Curriculum
Embretson, Susan E. – Measurement: Interdisciplinary Research and Perspectives, 2004
The last century was marked by dazzling changes in many areas, such as technology and communications. Predictions into the second century of testing are seemingly difficult in such a context. Yet, looking back to the turn of the last century, Kirkpatrick (1900), in his American Psychological Association presidential address, presented fundamental…
Descriptors: Ability, Testing, Futures (of Society), Psychometrics
Tollefson, Nona; Tracy, D. B. – 1979
The validity and reliability of essay scores were examined by comparing the mean scores assigned to good and poor quality essay responses of different lengths written by high school sophomores. In-service and pre-service social studies teachers graded essay responses to a test question requiring knowledge of the Constitutional provisions for…
Descriptors: Essay Tests, Essays, Evaluation Criteria, High Schools
Peer reviewedMichael, William B.; And Others – Educational and Psychological Measurement, 1985
A shortened Study Attitudes and Methods Survey (SAMS) was administered to 181 community college students. Four original factors remained in the new version: academic interest--love of learning; study anxiety; manipulation; and alienation toward authority. Academic drive--conformity and study methods were dropped, while facilitative study behaviors…
Descriptors: Attitude Measures, Factor Structure, Item Analysis, School Attitudes
Boyd, Thomas A.; Tramontana, Michael G. – 1984
To examine the validity of short forms of the Wechsler Intelligence Scale for Children-Revised (WISC-R), the WISC-R was first administered to 106 hospitalized psychiatric patients, aged 8-16. No subjects had a primary diagnosis of mental retardation or learning disability, and one-third were receiving psychotropic medication. WISC-R IQ scores…
Descriptors: Adolescents, Children, Correlation, Elementary Secondary Education
Hisama, Kay K.; And Others – 1977
The optimal test length, using predictive validity as a criterion, depends on two major conditions: the appropriate item-difficulty rather than the total number of items, and the method used in scoring the test. These conclusions were reached when responses to a 100-item multi-level test of reading comprehension from 136 non-native speakers of…
Descriptors: College Students, Difficulty Level, English (Second Language), Foreign Students
de Jong, John H. A. L. – Toegepaste taalwetenschap in artikelen 20, 1984
A study investigated the validity of an English listening skills test by comparing the results of native American and British English speakers with those of Dutch students of English as a second language. A hypothesis suggested that two-thirds of the items would test listening skills and the remaining third would test other knowledge. Test results…
Descriptors: Age Differences, Comparative Analysis, Correlation, Educational Background
Bergstrom, Betty A.; Lunz, Mary E. – 1991
The level of confidence in pass/fail decisions obtained with computer adaptive tests (CATs) was compared to decisions based on paper-and-pencil tests. Subjects included 645 medical technology students from 238 educational programs across the country. The tests used in this study constituted part of the subjects' review for the certification…
Descriptors: Adaptive Testing, Certification, Comparative Testing, Computer Assisted Testing
New York State Div. for Youth, Albany. – 1985
This guide is designed to serve as a reference to assist providers of Job Training Partnership Act-funded programs in selecting appropriate interest, aptitude, and pre-employment and job readiness tests. Descriptions of 53 interest tests, 38 aptitude tests, and 37 pre-employment and job readiness tests are provided. Each description contains…
Descriptors: Aptitude Tests, Employment Potential, Evaluation Criteria, Guidelines

Direct link
