NotesFAQContact Us
Collection
Advanced
Search Tips
Showing 6,271 to 6,285 of 9,533 results Save | Export
Peer reviewed Peer reviewed
Eggen, T. J. H. M.; Straetmans, G. J. J. M. – Educational and Psychological Measurement, 2000
Studied the use of adaptive testing when examinees are classified into three categories. Established testing algorithms with two different statistical computation procedures and evaluated them through simulation using an operative item bank from Dutch basic adult education. Results suggest a reduction of at least 22% in the mean number of items…
Descriptors: Adaptive Testing, Adult Education, Algorithms, Classification
Peer reviewed Peer reviewed
Bramley, Tom – Evaluation & Research in Education, 2001
Analyzed data from a session of the General Certificate of Secondary Education (GCSE) mathematics examination to identify items displaying a bi-modal expected score distribution, try to explain the bi-modality, rescore the items to remove under-used middle categories, and determine the effect on test reliability of rescoring the data. Discusses…
Descriptors: Foreign Countries, Mathematics Tests, Reliability, Scores
Peer reviewed Peer reviewed
Ginther, April – Language Testing, 2002
A nested cross-over design was used to examine the effects of visual condition, type of stimuli, and language proficiency on listening comprehension items of the Test of English as a Foreign Language. Three two-way interactions were significant: proficiency by type of stimuli, type of stimuli by visual condition, and type of stimuli by time.…
Descriptors: English (Second Language), Language Proficiency, Language Tests, Listening Comprehension
Peer reviewed Peer reviewed
Scialfa, Charles; Legare, Connie; Wenger, Larry; Dingley, Louis – Teaching of Psychology, 2001
Analyzes multiple-choice questions provided in test banks for introductory psychology textbooks. Study 1 offered a consistent picture of the objective difficulty of multiple-choice tests for introductory psychology students, while both studies 1 and 2 indicated that test items taken from commercial test banks have poor psychometric properties.…
Descriptors: Difficulty Level, Educational Research, Higher Education, Introductory Courses
Peer reviewed Peer reviewed
Direct linkDirect link
Mareschal, Denis; Powell, Daisy; Westermann, Gert; Volein, Agnes – Infant and Child Development, 2005
Young infants are very sensitive to feature distribution information in the environment. However, existing work suggests that they do not make use of correlation information to form certain perceptual categories until at least 7 months of age. We suggest that the failure to use correlation information is a by-product of familiarization procedures…
Descriptors: Infants, Classification, Correlation, Familiarity
Peer reviewed Peer reviewed
Direct linkDirect link
Pomplun, Mark; Custer, Michael – Applied Measurement in Education, 2005
In this study, we investigated possible context effects when students chose to defer items and answer those items later during a computerized test. In 4 primary school reading tests, 126 items were studied. Logistic regression analyses identified 4 items across 4 grade levels as statistically significant. However, follow-up analyses indicated that…
Descriptors: Psychometrics, Reading Tests, Effect Size, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Venkateswaran, Uma – History Teacher, 2004
Over the past two decades, remarkable strides have been made in examining, documenting, and incorporating race and gender issues in history courses, but it is time to take a look at the ways in which these curricular and pedagogical changes have impacted the Advanced Placement United States History Examination. This paper focuses on three…
Descriptors: United States History, Advanced Placement, Standardized Tests, Test Bias
Peer reviewed Peer reviewed
Direct linkDirect link
Rupp, Andre A. – International Journal of Testing, 2003
Item response theory (IRT) has become one of the most popular scoring frameworks for measurement data. IRT models are used frequently in computerized adaptive testing, cognitively diagnostic assessment, and test equating. This article reviews two of the most popular software packages for IRT model estimation, BILOG-MG (Zimowski, Muraki, Mislevy, &…
Descriptors: Test Items, Adaptive Testing, Item Response Theory, Computer Software
Peer reviewed Peer reviewed
Direct linkDirect link
Schaeffer, Gary A.; Henderson-Montero, Diane; Julian, Marc; Bene, Nancy H. – Educational Assessment, 2002
A number of methods for scoring tests with selected-response (SR) and constructed-response (CR) items are available. The selection of a method depends on the requirements of the program, the particular psychometric model and assumptions employed in the analysis of item and score data, and how scores are to be used. This article compares 3 methods:…
Descriptors: Scoring, Responses, Test Items, Raw Scores
Peer reviewed Peer reviewed
Direct linkDirect link
Su, Ya-Hui; Wang, Wen-Chung – Applied Measurement in Education, 2005
Simulations were conducted to investigate factors that influence the Mantel, generalized Mantel-Haenszel (GMH), and logistic discriminant function analysis (LDFA) methods in assessing differential item functioning (DIF) for polytomous items. The results show that the magnitude of DIF contamination in the matching score, as measured by the average…
Descriptors: Discriminant Analysis, Test Bias, Research Methodology, Test Items
Peer reviewed Peer reviewed
Direct linkDirect link
Lewis, Kelly M.; Lambert, Michael C. – Assessment, 2006
Studies addressing Black adolescents' social change strategies are nonexistent and might be associated with the absence of social change measures for Black adolescents. In an effort to begin addressing this concern, the 30-item Measure of Social Change for Adolescents (MOSC-A) was designed to measure Black adolescents' first- (i.e., within the…
Descriptors: African Americans, Adolescents, Social Change, Change Strategies
Peer reviewed Peer reviewed
Direct linkDirect link
Beretvas, S. Natasha; Williams, Natasha J. – Journal of Educational Measurement, 2004
To assess item dimensionality, the following two approaches are described and compared: hierarchical generalized linear model (HGLM) and multidimensional item response theory (MIRT) model. Two generating models are used to simulate dichotomous responses to a 17-item test: the unidimensional and compensatory two-dimensional (C2D) models. For C2D…
Descriptors: Item Response Theory, Test Items, Mathematics Tests, Reading Ability
Peer reviewed Peer reviewed
Direct linkDirect link
Lin, Jie – Alberta Journal of Educational Research, 2006
The Bookmark standard-setting procedure was developed to address the perceived problems with the most popular method for setting cut-scores: the Angoff procedure (Angoff, 1971). The purposes of this article are to review the Bookmark procedure and evaluate it in terms of Berk's (1986) criteria for evaluating cut-score setting methods. The…
Descriptors: Standard Setting (Scoring), Cutting Scores, Evaluation Criteria, Evaluation Research
Peer reviewed Peer reviewed
Direct linkDirect link
Eggen, Theo J. H. M.; Verschoor, Angela J. – Applied Psychological Measurement, 2006
Computerized adaptive tests (CATs) are individualized tests that, from a measurement point of view, are optimal for each individual, possibly under some practical conditions. In the present study, it is shown that maximum information item selection in CATs using an item bank that is calibrated with the one- or the two-parameter logistic model…
Descriptors: Adaptive Testing, Difficulty Level, Test Items, Item Response Theory
Peer reviewed Peer reviewed
Direct linkDirect link
Nicholson, Joanne; Konstantinidi, Eva; Furniss, Frederick – Research in Developmental Disabilities: A Multidisciplinary Journal, 2006
This study examined certain psychometric properties of the questions about behavioral function (QABF) scale. The QABF was completed on 118 problem behaviours presented by 40 young people with severe intellectual disabilities, and measures of inter-rater reliability, internal consistency, and construct validity were calculated overall and for…
Descriptors: Psychometrics, Factor Analysis, Interrater Reliability, Functional Behavioral Assessment
Pages: 1  |  ...  |  415  |  416  |  417  |  418  |  419  |  420  |  421  |  422  |  423  |  ...  |  636