NotesFAQContact Us
Collection
Advanced
Search Tips
Laws, Policies, & Programs
No Child Left Behind Act 20011
What Works Clearinghouse Rating
Showing 1 to 15 of 31 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Jianbin Fu; Xuan Tan; Patrick C. Kyllonen – Journal of Educational Measurement, 2024
This paper presents the item and test information functions of the Rank two-parameter logistic models (Rank-2PLM) for items with two (pair) and three (triplet) statements in forced-choice questionnaires. The Rank-2PLM model for pairs is the MUPP-2PLM (Multi-Unidimensional Pairwise Preference) and, for triplets, is the Triplet-2PLM. Fisher's…
Descriptors: Questionnaires, Test Items, Item Response Theory, Models
Peer reviewed Peer reviewed
Direct linkDirect link
Glazer, Nancy; Wolfe, Edward W. – Applied Measurement in Education, 2020
This introductory article describes how constructed response scoring is carried out, particularly the rater monitoring processes and illustrates three potential designs for conducting rater monitoring in an operational scoring project. The introduction also presents a framework for interpreting research conducted by those who study the constructed…
Descriptors: Scoring, Test Format, Responses, Predictor Variables
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Lynch, Sarah – Practical Assessment, Research & Evaluation, 2022
In today's digital age, tests are increasingly being delivered on computers. Many of these computer-based tests (CBTs) have been adapted from paper-based tests (PBTs). However, this change in mode of test administration has the potential to introduce construct-irrelevant variance, affecting the validity of score interpretations. Because of this,…
Descriptors: Computer Assisted Testing, Tests, Scores, Scoring
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Sharakhimov, Shoaziz; Nurmukhamedov, Ulugbek – English Teaching Forum, 2021
Vocabulary learning is an incremental process. Vocabulary knowledge, especially for second-language learners, may develop across a lifetime. Teachers with experience in providing feedback on their students' vocabulary use in writing or speech might have noticed that it is sometimes difficult to pinpoint one aspect of word knowledge. The reason is…
Descriptors: Vocabulary Development, Second Language Learning, Second Language Instruction, English (Second Language)
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Crowther, Gregory J.; Wiggins, Benjamin L.; Jenkins, Lekelia D. – HAPS Educator, 2020
Many undergraduate biology instructors incorporate active learning exercises into their lessons while continuing to assess students with traditional exams. To better align practice and exams, we present an approach to question-asking that emphasizes templates instead of specific questions. Students and instructors can use these Test Question…
Descriptors: Science Tests, Active Learning, Biology, Undergraduate Students
Peer reviewed Peer reviewed
Direct linkDirect link
Kamber, David N. – Journal of Chemical Education, 2021
In-person interactions between faculty and students personalize the learning experience and are the hallmark of primarily undergraduate institutions. These invaluable student--faculty interactions were disrupted during the 2019 SARS-CoV-2 global pandemic and led to a rapid, unprecedented shift to distance-learning. Within the space of virtual…
Descriptors: Biochemistry, Science Instruction, Teaching Methods, Undergraduate Students
Mullis, Ina V. S., Ed.; Martin, Michael O., Ed.; von Davier, Matthias, Ed. – International Association for the Evaluation of Educational Achievement, 2021
TIMSS (Trends in International Mathematics and Science Study) is a long-standing international assessment of mathematics and science at the fourth and eighth grades that has been collecting trend data every four years since 1995. About 70 countries use TIMSS trend data for monitoring the effectiveness of their education systems in a global…
Descriptors: Achievement Tests, International Assessment, Science Achievement, Mathematics Achievement
Peer reviewed Peer reviewed
Direct linkDirect link
Boone, William J. – CBE - Life Sciences Education, 2016
This essay describes Rasch analysis psychometric techniques and how such techniques can be used by life sciences education researchers to guide the development and use of surveys and tests. Specifically, Rasch techniques can be used to document and evaluate the measurement functioning of such instruments. Rasch techniques also allow researchers to…
Descriptors: Item Response Theory, Psychometrics, Science Education, Educational Research
Peer reviewed Peer reviewed
Direct linkDirect link
Kubinger, Klaus D. – Educational and Psychological Measurement, 2009
The linear logistic test model (LLTM) breaks down the item parameter of the Rasch model as a linear combination of some hypothesized elementary parameters. Although the original purpose of applying the LLTM was primarily to generate test items with specified item difficulty, there are still many other potential applications, which may be of use…
Descriptors: Models, Test Items, Psychometrics, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Holme, Thomas; Murphy, Kristen – Journal of Chemical Education, 2011
In 2005, the ACS Examinations Institute released an exam for first-term general chemistry in which items are intentionally paired with one conceptual and one traditional item. A second-term, paired-questions exam was released in 2007. This paper presents an empirical study of student performances on these two exams based on national samples of…
Descriptors: Chemistry, Science Tests, College Science, Undergraduate Students
Peer reviewed Peer reviewed
Direct linkDirect link
Carr, Nathan T.; Xi, Xiaoming – Language Assessment Quarterly, 2010
This article examines how the use of automated scoring procedures for short-answer reading tasks can affect the constructs being assessed. In particular, it highlights ways in which the development of scoring algorithms intended to apply the criteria used by human raters can lead test developers to reexamine and even refine the constructs they…
Descriptors: Scoring, Automation, Reading Tests, Test Format
Peer reviewed Peer reviewed
Direct linkDirect link
Solheim, Oddny Judith; Skaftun, Atle – Assessment in Education: Principles, Policy & Practice, 2009
During the last three decades the constructed response format has gradually gained entry in large-scale assessments of reading-comprehension. In their 1991 Reading Literacy Study The International Association for the Evaluation of Educational Achievement (IEA) included constructed response items on an exploratory basis. Ten years later, in…
Descriptors: Reading Comprehension, Literacy, Reading Tests, Responses
Peer reviewed Peer reviewed
Direct linkDirect link
van der Linden, Wim J. – Measurement: Interdisciplinary Research and Perspectives, 2010
The traditional way of equating the scores on a new test form X to those on an old form Y is equipercentile equating for a population of examinees. Because the population is likely to change between the two administrations, a popular approach is to equate for a "synthetic population." The authors of the articles in this issue of the…
Descriptors: Test Format, Equated Scores, Population Distribution, Population Trends
Peer reviewed Peer reviewed
Direct linkDirect link
Schumacker, Randall E.; Smith, Everett V., Jr. – Educational and Psychological Measurement, 2007
Measurement error is a common theme in classical measurement models used in testing and assessment. In classical measurement models, the definition of measurement error and the subsequent reliability coefficients differ on the basis of the test administration design. Internal consistency reliability specifies error due primarily to poor item…
Descriptors: Measurement Techniques, Error of Measurement, Item Sampling, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Yi, Hyun Sook; Kim, Seonghoon; Brennan, Robert L. – Applied Psychological Measurement, 2007
Large-scale testing programs involving classification decisions typically have multiple forms available and conduct equating to ensure cut-score comparability across forms. A test developer might be interested in the extent to which an examinee who happens to take a particular form would have a consistent classification decision if he or she had…
Descriptors: Classification, Reliability, Indexes, Computation
Previous Page | Next Page ยป
Pages: 1  |  2  |  3