NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 976 to 990 of 1,186 results Save | Export
Shorey, Leonard – 1991
Tests in social studies and integrated science given in Saint Vincent, Saint Lucia, Grenada, and Dominica were analyzed by the Organization for Co-operation in Overseas Development (OCOD) Comprehensive Teacher Training Program (CTTP) for discrimination, difficulty, and reliability, as well as other characteristics. There were 767 examinees for the…
Descriptors: Difficulty Level, Elementary Secondary Education, Evaluation Methods, Foreign Countries
Ackerman, Terry A.; Evans, John A. – 1992
The relationship between levels of reliability and the power of two bias and differential item functioning (DIF) detection methods is examined. Both methods, the Mantel-Haenszel (MH) procedure of P. W. Holland and D. T. Thayer (1988) and the Simultaneous Item Bias (SIB) procedure of R. Shealy and W. Stout (1991), use examinees' raw scores as a…
Descriptors: Comparative Analysis, Equations (Mathematics), Error of Measurement, Item Bias
Stansfield, Charles W.; And Others – 1992
This report describes the development, construction, and validation of the Preliminary Chinese Proficiency Test (Pre-CPT), a standardized, nationally-normed test of listening and reading comprehension for beginning-level native English-speaking learners of Chinese as a second language. The Pre-CPT was designed as a lower-level version of the…
Descriptors: Chinese, Higher Education, Language Proficiency, Language Tests
Stone, Gregory Ethan; Lunz, Mary E. – 1994
This paper explores the comparability of item calibrations for three types of items: (1) text only; (2) text with photographs; and (3) text plus graphics when items are presented on written tests and computerized adaptive tests. Data are from five different medical technology certification examinations administered nationwide in 1993. The Rasch…
Descriptors: Adaptive Testing, Comparative Analysis, Computer Assisted Testing, Diagrams
Powell, Jack L.; Brand, Alice G. – 1986
The Brand Emotions Scale for Writers (BSEW) is a 20-item scale designed to measure the emotions of writers; (1) immediately before writing (state-before), (2) immediately after writing (state-after), and (3) when writing in general (trait). This paper describes the development of BSEW and the factor structure of these three different forms. Common…
Descriptors: Affective Measures, Authors, Behavior Rating Scales, Factor Structure
O'Brien, Mary Utne; Ingels, Steven J. – 1984
Intended to provide information about the development of the Economics Values Inventory (EVI), the report describes considerations that directed development of test items and provides indicators of the reliability and validity of the proposed test instrument. The EVI is recommended as an effective measuring instrument in experimental evaluations…
Descriptors: Affective Measures, Attitude Change, Attitude Measures, Economics
Byars, Alvin Gregg – 1980
The objectives of this investigation are to develop, describe, assess, and demonstrate procedures for constructing mastery tests to minimize errors of classification and to maximize decision reliability. The guidelines are based on conditions where item exchangeability is a reasonable assumption and the test constructor can control the number of…
Descriptors: Cutting Scores, Difficulty Level, Grade 4, Intermediate Grades
Gifford, Janice A.; Hambleton, Ronald K. – 1980
Technical considerations associated with item selection and reliability assessment are considered in relation to criterion-referenced tests constructed to provide group information. The purpose is to emphasize test building and the evaluation of test scores in program evaluation studies. It is stressed that an evaluator employ a performance or…
Descriptors: Criterion Referenced Tests, Group Testing, Item Sampling, Models
Bejar, Issac I. – 1976
The concept of testing for partial knowledge is considered with the concept of tailored testing. Following the special usage of latent trait theory, the word valdity is used to mean the correlation of a test with the construct the test measures. The concept of a method factor in the test is also considered as a part of the validity. The possible…
Descriptors: Achievement Tests, Adaptive Testing, Computer Assisted Testing, Confidence Testing
Catts, Ralph – 1978
The reliability of multiple choice tests--containing different numbers of response options--was investigated for 260 students enrolled in technical college economics courses. Four test forms, constructed from previously used four-option items, were administered, consisting of (1) 60 two-option items--two distractors randomly discarded; (2) 40…
Descriptors: Answer Sheets, Difficulty Level, Foreign Countries, Higher Education
Roid, Gale; Finn, Patrick – 1978
The feasibility of generating multiple-choice test questions by transforming sentences from prose instructional materials was examined. A computer-based algorithm was used to analyze prose subject matter and to identify high-information words. Sentences containing selected words were then transformed into multiple-choice items by four writers who…
Descriptors: Algorithms, Criterion Referenced Tests, Difficulty Level, Form Classes (Languages)
Benson, Jeri; And Others – 1978
The precision and efficiency of a cognitive test constructed by three different methods of item analysis was compared, using the verbal aptitude subtest of the Florida Twelfth Grade Test. Classical item analysis, factor analysis and the Rasch logistic model were used in the construction of 15 and 30 item subtests and replicated for samples of 250,…
Descriptors: Cognitive Tests, Comparative Analysis, Efficiency, Factor Analysis
Peer reviewed Peer reviewed
Grosse, Martin E.; Wright, Benjamin D. – Evaluation and the Health Professions, 1986
Based on the standard setting procedures or the American Board of Preventive Medicine for their Core Test, this article describes how Rasch measurement can facilitate using test content judgments in setting a standard. Rasch measurement can then be used to evaluate and improve the precision of the standard and to hold it constant across time.…
Descriptors: Certification, Criterion Referenced Tests, Difficulty Level, Health Personnel
Peer reviewed Peer reviewed
Altepeter, Tom – School Psychology Review, 1983
A critical review of the Expressive One-Word Picture Vocabulary Test (Gardner) is offered. The reviewer feels that the instrument cannot be recommended in its present form. Further research concerning the manual, and theoretical issues, (particularly test-retest stability) is strongly recommended. (Author/PN)
Descriptors: Error of Measurement, Intelligence Tests, Item Analysis, Pictorial Stimuli
Peer reviewed Peer reviewed
Black, Paul – Studies in Educational Evaluation, 1995
The role of assessment in science education is explored, focusing on summative assessment in British public certificate examinations. Examples of test items are presented to illustrate difficulties in making valid and reliable assessments, and issues with implications for formative assessment are discussed. (SLD)
Descriptors: Educational Assessment, Feedback, Foreign Countries, Formative Evaluation
Pages: 1  |  ...  |  62  |  63  |  64  |  65  |  66  |  67  |  68  |  69  |  70  |  ...  |  80