NotesFAQContact Us
Collection
Advanced
Search Tips
Showing all 14 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Lee, Won-Chan; Kim, Stella Y.; Choi, Jiwon; Kang, Yujin – Journal of Educational Measurement, 2020
This article considers psychometric properties of composite raw scores and transformed scale scores on mixed-format tests that consist of a mixture of multiple-choice and free-response items. Test scores on several mixed-format tests are evaluated with respect to conditional and overall standard errors of measurement, score reliability, and…
Descriptors: Raw Scores, Item Response Theory, Test Format, Multiple Choice Tests
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Neiro, Jakke; Johansson, Niko – LUMAT: International Journal on Math, Science and Technology Education, 2020
The history and evolution of science assessment remains poorly known, especially in the context of the exam question contents. Here we analyze the Finnish matriculation examination in biology from the 1920s to 1960s to understand how the exam has evolved in both its knowledge content and educational form. Each question was classified according to…
Descriptors: Foreign Countries, Biology, Test Content, Test Format
Peer reviewed Peer reviewed
Schriesheim, Chester A. – Educational and Psychological Measurement, 1981
This study provides support for the hypothesized effect of leniency on the discriminant validity of grouped questionnaire items. It was found that controlling for leniency resulted in a slight decrement in convergent validity but that discriminant validity was substantially improved. Implications for questionnaire validity and further research are…
Descriptors: Classification, Correlation, Questionnaires, Research Problems
Peer reviewed Peer reviewed
Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
Results of 96 theoretical/empirical studies were reviewed to see if they support a taxonomy of 43 rules for writing multiple-choice test items. The taxonomy is the result of an analysis of 46 textbooks dealing with multiple-choice item writing. For nearly half of the rules, no research was found. (SLD)
Descriptors: Classification, Literature Reviews, Multiple Choice Tests, Test Construction
Schulz, E. Matthew; Wang, Lin – 2001
In this study, items were drawn from a full-length test of 30 items in order to construct shorter tests for the purpose of making accurate pass/fail classifications with regard to a specific criterion point on the latent ability metric. A three-item parameter Item Response Theory (IRT) framework was used. The criterion point on the latent ability…
Descriptors: Ability, Classification, Item Response Theory, Pass Fail Grading
Peer reviewed Peer reviewed
Direct linkDirect link
Bruton, Anthony – Canadian Modern Language Review, 2007
This analysis evaluates the receptive tests of targeted lexical knowledge in the written medium, which are typically used in empirical research into lexical acquisition from reading foreign/second language texts. Apart from the types of second language cues or prompts, and the language of the responses, the main issues revolve around: (a) the…
Descriptors: Knowledge Level, Form Classes (Languages), Second Language Learning, Vocabulary Development
Peer reviewed Peer reviewed
Haladyna, Thomas M.; Downing, Steven M. – Applied Measurement in Education, 1989
A taxonomy of 43 rules for writing multiple-choice test items is presented, based on a consensus of 46 textbooks. These guidelines are presented as complete and authoritative, with solid consensus apparent for 33 of the rules. Four rules lack consensus, and 5 rules were cited fewer than 10 times. (SLD)
Descriptors: Classification, Interrater Reliability, Multiple Choice Tests, Objective Tests
Nissan, Susan; And Others – 1996
One of the item types in the Listening Comprehension section of the Test of English as a Foreign Language (TOEFL) test is the dialogue. Because the dialogue item pool needs to have an appropriate balance of items at a range of difficulty levels, test developers have examined items at various difficulty levels in an attempt to identify their…
Descriptors: Classification, Dialogs (Language), Difficulty Level, English (Second Language)
Sykes, Robert C.; And Others – 1992
A part-form methodology was used to study the effect of varying degrees of multidimensionality on the consistency of pass/fail classification decisions obtained from simulated unidimensional item response theory (IRT) based licensure examinations. A control on the degree of form multidimensionality permitted an assessment throughout the range of…
Descriptors: Classification, Comparative Testing, Computer Simulation, Decision Making
Wise, Lauress – 1993
As high-stakes use of tests increases, it becomes vital that test developers and test users communicate clearly about the accuracy and limitations of the scores generated by a test after it is assembled and used. A procedure is described for portraying the accuracy of test scores. It can be used in setting accuracy targets during form construction…
Descriptors: Classification, High Stakes Tests, Item Response Theory, Military Personnel
Hanson, Bradley A.; Bay, Luz; Loomis, Susan Cooper – 1998
Research studies using booklet classification were implemented by the American College Testing Program to investigate the linkage between the National Assessment of Educational Progress (NAEP) Achievement Levels Descriptions and the cutpoints set to represent student performance with respect to the achievement levels. This paper describes the…
Descriptors: Academic Achievement, Classification, Cutting Scores, Discriminant Analysis
Stansfield, Charles W. – 1990
The IDEA Oral Language Proficiency Test (IPT II), an individually-administered measure of speaking and listening proficiency in English as a Second Language designed for secondary school students, is described and discussed. The test consists of 91 items and requires 5-25 minutes to administer. Raw scores are converted to one of seven proficiency…
Descriptors: Classification, English (Second Language), Language Proficiency, Language Tests
Finch, F. L.; Dost, Marcia A. – 1992
Many state and local entities are developing and using performance assessment programs. Because these initiatives are so diverse, it is very difficult to understand what they are doing, or to compare them in any meaningful way. Multiple-choice tests are contrasted with performance assessments, and preliminary classifications are suggested to…
Descriptors: Alternative Assessment, Classification, Comparative Analysis, Constructed Response
Read, John; Nation, Paul – 1986
A review of the literature on a variety of issues related to testing vocabulary knowledge in a second language addresses these topics: problems in estimating vocabulary size, including the related questions of what constitutes a word, how a sample should be selected, and what are the criteria for knowing a word; sampling the basic and specialized…
Descriptors: Achievement Tests, Check Lists, Classification, Comparative Analysis