NotesFAQContact Us
Collection
Advanced
Search Tips
What Works Clearinghouse Rating
Does not meet standards1
Showing 106 to 120 of 3,310 results Save | Export
Peer reviewed Peer reviewed
Direct linkDirect link
Penaloza, Roberto V.; Berends, Mark – Sociological Methods & Research, 2022
To measure "treatment" effects, social science researchers typically rely on nonexperimental data. In education, school and teacher effects on students are often measured through value-added models (VAMs) that are not fully understood. We propose a framework that relates to the education production function in its most flexible form and…
Descriptors: Data, Value Added Models, Error of Measurement, Correlation
Peer reviewed Peer reviewed
Direct linkDirect link
Levin, Joel R.; Ferron, John M.; Gafurov, Boris S. – Journal of Education for Students Placed at Risk, 2022
The present simulation study examined the statistical properties (namely, Type I error and statistical power) of various novel randomized single-case multiple-baseline designs and associated randomized-test analyses for comparing the A- to B-phase immediate abrupt outcome changes in two independent intervention conditions. It was found that with…
Descriptors: Statistical Analysis, Error of Measurement, Intervention, Program Effectiveness
Peer reviewed Peer reviewed
PDF on ERIC Download full text
Metsämuuronen, Jari – Practical Assessment, Research & Evaluation, 2022
The reliability of a test score is usually underestimated and the deflation may be profound, 0.40 - 0.60 units of reliability or 46 - 71%. Eight root sources of the deflation are discussed and quantified by a simulation with 1,440 real-world datasets: (1) errors in the measurement modelling, (2) inefficiency in the estimator of reliability within…
Descriptors: Test Reliability, Scores, Test Items, Correlation
Ryan Derickson – ProQuest LLC, 2022
Item Response Theory (IRT) models are a popular analytic method for self report data. We show how traditional IRT models can be vulnerable to specific kinds of asymmetric measurement error (AME) in self-report data, because the models spread the error to all estimates -- even those of items that do not contribute error. We quantify the impact of…
Descriptors: Item Response Theory, Measurement Techniques, Error of Measurement, Models
Peer reviewed Peer reviewed
Direct linkDirect link
John Jerrim; Luis Alejandro Lopez-Agudo; Oscar David Marcenaro-Gutierrez – British Journal of Educational Studies, 2024
International large-scale assessments have gained much attention since the beginning of the twenty-first century, influencing education legislation in many countries. This includes Spain, where they have been used by successive governments to justify education policy change. Unfortunately, there was a problem with the PISA 2018 reading scores for…
Descriptors: Foreign Countries, Achievement Tests, International Assessment, Secondary School Students
Peer reviewed Peer reviewed
Direct linkDirect link
Stella Y. Kim; Carl Westine; Tong Wu; Derek Maher – Journal of College Student Retention: Research, Theory & Practice, 2024
The primary purpose of this study is to validate a student engagement measure for its use in evaluation of a learning assistant (LA) program. A series of psychometric evaluations were made for both the original scale of Higher Education Student Engagement Scale (HESES) and its adapted version designed to be used in gauging the effectiveness of…
Descriptors: Learner Engagement, Teaching Assistants, Test Validity, Test Reliability
Peer reviewed Peer reviewed
Direct linkDirect link
Tülin Otbiçer Acar – Measurement: Interdisciplinary Research and Perspectives, 2024
The aim of this study is to compare the results of correlation coefficient estimation of reliability with those obtained through the Bland-Altman plot technique. The scale was first divided into two halves using three different approaches. A linear and high-level relationship was found between the scale scores obtained from the halved forms.…
Descriptors: High School Students, Measurement Techniques, Psychometrics, Comparative Testing
Peer reviewed Peer reviewed
Direct linkDirect link
Nicolas Pichot; Boris Forthmann; Eric Bonetto; Thomas Arciszewski; Nathalie Bonnardel; Sara Jaubert; Jean B. Pavani – Journal of Creative Behavior, 2024
The term "creative" is commonly used in everyday language and in academic discourse to discuss the nature of artistic and innovative productions. This usage inherently implies the existence of a variable of creativity that allows different creative works to be compared. The standard definition of creativity asserts that a production must…
Descriptors: Creativity, Test Construction, Test Validity, Productive Thinking
Peer reviewed Peer reviewed
Direct linkDirect link
John R. Donoghue; Carol Eckerly – Applied Measurement in Education, 2024
Trend scoring constructed response items (i.e. rescoring Time A responses at Time B) gives rise to two-way data that follow a product multinomial distribution rather than the multinomial distribution that is usually assumed. Recent work has shown that the difference in sampling model can have profound negative effects on statistics usually used to…
Descriptors: Scoring, Error of Measurement, Reliability, Scoring Rubrics
Peer reviewed Peer reviewed
Direct linkDirect link
Daniel McNeish; Melissa G. Wolf – Structural Equation Modeling: A Multidisciplinary Journal, 2024
Despite the popularity of traditional fit index cutoffs like RMSEA [less than or equal to] 0.06 and CFI [greater than or equal to] 0.95, several studies have noted issues with overgeneralizing traditional cutoffs. Computational methods have been proposed to avoid overgeneralization by deriving cutoffs specifically tailored to the characteristics…
Descriptors: Structural Equation Models, Cutting Scores, Generalizability Theory, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Hyunjung Lee; Heining Cham – Educational and Psychological Measurement, 2024
Determining the number of factors in exploratory factor analysis (EFA) is crucial because it affects the rest of the analysis and the conclusions of the study. Researchers have developed various methods for deciding the number of factors to retain in EFA, but this remains one of the most difficult decisions in the EFA. The purpose of this study is…
Descriptors: Factor Structure, Factor Analysis, Monte Carlo Methods, Goodness of Fit
Peer reviewed Peer reviewed
Direct linkDirect link
Suyoung Kim; Sooyong Lee; Jiwon Kim; Tiffany A. Whittaker – Structural Equation Modeling: A Multidisciplinary Journal, 2024
This study aims to address a gap in the social and behavioral sciences literature concerning interaction effects between latent factors in multiple-group analysis. By comparing two approaches for estimating latent interactions within multiple-group analysis frameworks using simulation studies and empirical data, we assess their relative merits.…
Descriptors: Social Science Research, Behavioral Sciences, Structural Equation Models, Statistical Analysis
Peer reviewed Peer reviewed
Direct linkDirect link
Rebekka Kupffer; Susanne Frick; Eunike Wetzel – Educational and Psychological Measurement, 2024
The multidimensional forced-choice (MFC) format is an alternative to rating scales in which participants rank items according to how well the items describe them. Currently, little is known about how to detect careless responding in MFC data. The aim of this study was to adapt a number of indices used for rating scales to the MFC format and…
Descriptors: Measurement Techniques, Alternative Assessment, Rating Scales, Questionnaires
Peer reviewed Peer reviewed
Direct linkDirect link
Sean Joo; Montserrat Valdivia; Dubravka Svetina Valdivia; Leslie Rutkowski – Journal of Educational and Behavioral Statistics, 2024
Evaluating scale comparability in international large-scale assessments depends on measurement invariance (MI). The root mean square deviation (RMSD) is a standard method for establishing MI in several programs, such as the Programme for International Student Assessment and the Programme for the International Assessment of Adult Competencies.…
Descriptors: International Assessment, Monte Carlo Methods, Statistical Studies, Error of Measurement
Peer reviewed Peer reviewed
Direct linkDirect link
Kelly Edwards; James Soland – Educational Assessment, 2024
Classroom observational protocols, in which raters observe and score the quality of teachers' instructional practices, are often used to evaluate teachers for consequential purposes despite evidence that scores from such protocols are frequently driven by factors, such as rater and temporal effects, that have little to do with teacher quality. In…
Descriptors: Classroom Observation Techniques, Teacher Evaluation, Accuracy, Scores
Pages: 1  |  ...  |  4  |  5  |  6  |  7  |  8  |  9  |  10  |  11  |  12  |  ...  |  221