Publication Date
| In 2026 | 0 |
| Since 2025 | 53 |
| Since 2022 (last 5 years) | 411 |
| Since 2017 (last 10 years) | 914 |
| Since 2007 (last 20 years) | 1965 |
Descriptor
Source
Author
Publication Type
Education Level
Audience
| Researchers | 93 |
| Practitioners | 23 |
| Teachers | 22 |
| Policymakers | 10 |
| Administrators | 5 |
| Students | 4 |
| Counselors | 2 |
| Parents | 2 |
| Community | 1 |
Location
| United States | 47 |
| Germany | 42 |
| Australia | 34 |
| Canada | 27 |
| Turkey | 27 |
| California | 22 |
| United Kingdom (England) | 20 |
| Netherlands | 18 |
| China | 17 |
| New York | 15 |
| United Kingdom | 15 |
| More ▼ | |
Laws, Policies, & Programs
Assessments and Surveys
What Works Clearinghouse Rating
| Does not meet standards | 1 |
Huang, Francis L. – Practical Assessment, Research & Evaluation, 2014
Clustered data (e.g., students within schools) are often analyzed in educational research where data are naturally nested. As a consequence, multilevel modeling (MLM) has commonly been used to study the contextual or group-level (e.g., school) effects on individual outcomes. The current study investigates the use of an alternative procedure to…
Descriptors: Hierarchical Linear Modeling, Regression (Statistics), Educational Research, Sampling
Min, Shangchao; He, Lianzhen – Language Testing, 2014
This study examined the relative effectiveness of the multidimensional bi-factor model and multidimensional testlet response theory (TRT) model in accommodating local dependence in testlet-based reading assessment with both dichotomously and polytomously scored items. The data used were 14,089 test-takers' item-level responses to the testlet-based…
Descriptors: Foreign Countries, Item Response Theory, Reading Tests, Test Items
Citkowicz, Martyna; Polanin, Joshua R. – Society for Research on Educational Effectiveness, 2014
Meta-analyses are syntheses of effect-size estimates obtained from a collection of studies to summarize a particular field or topic (Hedges, 1992; Lipsey & Wilson, 2001). These reviews are used to integrate knowledge that can inform both scientific inquiry and public policy, therefore it is important to ensure that the estimates of the effect…
Descriptors: Meta Analysis, Accountability, Cluster Grouping, Effect Size
Davis-Stober, Clintin P. – Psychometrika, 2011
Many researchers have demonstrated that fixed, exogenously chosen weights can be useful alternatives to Ordinary Least Squares (OLS) estimation within the linear model (e.g., Dawes, Am. Psychol. 34:571-582, 1979; Einhorn & Hogarth, Org. Behav. Human Perform. 13:171-192, 1975; Wainer, Psychol. Bull. 83:213-217, 1976). Generalizing the approach of…
Descriptors: Least Squares Statistics, Error of Measurement, Geometry, Computation
Atilgan, Hakan – Eurasian Journal of Educational Research, 2013
Problem Statement: Reliability, which refers to the degree to which measurement results are free from measurement errors, as well as its estimation, is an important issue in psychometrics. Several methods for estimating reliability have been suggested by various theories in the field of psychometrics. One of these theories is the generalizability…
Descriptors: Sample Size, Generalizability Theory, Mathematical Formulas, Measurement Techniques
Magis, David; Facon, Bruno – Educational and Psychological Measurement, 2013
Item purification is an iterative process that is often advocated as improving the identification of items affected by differential item functioning (DIF). With test-score-based DIF detection methods, item purification iteratively removes the items currently flagged as DIF from the test scores to get purified sets of items, unaffected by DIF. The…
Descriptors: Test Bias, Test Items, Statistical Analysis, Error of Measurement
Woodruff, David; Traynor, Anne; Cui, Zhongmin; Fang, Yu – ACT, Inc., 2013
Professional standards for educational testing recommend that both the overall standard error of measurement and the conditional standard error of measurement (CSEM) be computed on the score scale used to report scores to examinees. Several methods have been developed to compute scale score CSEMs. This paper compares three methods, based on…
Descriptors: Comparative Analysis, Error of Measurement, Scores, Scaling
Schweig, Jonathan – National Center for Research on Evaluation, Standards, and Student Testing (CRESST), 2013
Measuring school and classroom environments has become central in a nation-wide effort to develop comprehensive programs that measure teacher quality and teacher effectiveness. Formulating successful programs necessitates accurate and reliable methods for measuring these environmental variables. This paper uses a generalizability theory framework…
Descriptors: Error of Measurement, Hierarchical Linear Modeling, Educational Environment, Classroom Environment
Pelanek, Radek – Journal of Educational Data Mining, 2015
Researchers use many different metrics for evaluation of performance of student models. The aim of this paper is to provide an overview of commonly used metrics, to discuss properties, advantages, and disadvantages of different metrics, to summarize current practice in educational data mining, and to provide guidance for evaluation of student…
Descriptors: Models, Data Analysis, Data Processing, Evaluation Criteria
Rosén, Monica; Gustafsson, Jan-Eric – Large-scale Assessments in Education, 2016
Research on effects of home computer use on children's development of cognitive abilities and skills has yielded conflicting results, with some studies showing positive effects, others no effects, and yet others negative effects. These studies have typically used non-experimental designs and one of the main reasons for the conflicting results is…
Descriptors: Measurement, International Assessment, Grade 4, Longitudinal Studies
Holster, Trevor A.; Lake, J. – Language Assessment Quarterly, 2016
Stewart questioned Beglar's use of Rasch analysis of the Vocabulary Size Test (VST) and advocated the use of 3-parameter logistic item response theory (3PLIRT) on the basis that it models a non-zero lower asymptote for items, often called a "guessing" parameter. In support of this theory, Stewart presented fit statistics derived from…
Descriptors: Guessing (Tests), Item Response Theory, Vocabulary, Language Tests
Long, Mark C. – Journal of Research on Educational Effectiveness, 2016
Using a "naïve" specification, this paper estimates the relationship between 36 high school characteristics and 24 student outcomes controlling for students' pre-high school characteristics. The goal of this exploration is not to generate casual estimates, but rather to: (a) compare the size of the relationships to determine which inputs…
Descriptors: Hypothesis Testing, Effect Size, High School Students, Student Characteristics
Aryadoust, Vahid – Educational Psychology, 2016
This study sought to examine the development of paragraph writing skills of 116 English as a second language university students over the course of 12 weeks and the relationship between the linguistic features of students' written texts as measured by Coh-Metrix--a computational system for estimating textual features such as cohesion and…
Descriptors: English (Second Language), Second Language Learning, Writing Skills, College Students
Raymond, Mark R.; Swygert, Kimberly A.; Kahraman, Nilufer – Journal of Educational Measurement, 2012
Although a few studies report sizable score gains for examinees who repeat performance-based assessments, research has not yet addressed the reliability and validity of inferences based on ratings of repeat examinees on such tests. This study analyzed scores for 8,457 single-take examinees and 4,030 repeat examinees who completed a 6-hour clinical…
Descriptors: Physicians, Licensing Examinations (Professions), Performance Based Assessment, Repetition
Zopluoglu, Cengiz; Davenport, Ernest C., Jr. – Educational and Psychological Measurement, 2012
The generalized binomial test (GBT) and [omega] indices are the most recent methods suggested in the literature to detect answer copying behavior on multiple-choice tests. The [omega] index is one of the most studied indices, but there has not yet been a systematic simulation study for the GBT index. In addition, the effect of the ability levels…
Descriptors: Statistical Analysis, Error of Measurement, Simulation, Multiple Choice Tests

Peer reviewed
Direct link
