ERIC Number: ED663561
Record Type: Non-Journal
Publication Date: 2024-Sep-19
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: N/A
Structural after Measurement Estimation with Missing Data in Structural Equation Models
Ben Kelcey; Fangxing Bai; Amota Ataneka; Yanli Xie; Kyle Cox
Society for Research on Educational Effectiveness
We develop a structural after measurement (SAM) method for structural equation models (SEMs) that accommodates missing data. The results show that the proposed SAM missing data estimator outperforms conventional full information (FI) estimators in terms of convergence, bias, and root-mean-square-error in small-to-moderate samples or large samples with complex models or substantial missing data. The proposed estimator is implemented in R and illustrated through a simple mediation example. Background: Conventionally, measurement and structural parameters have been estimated concurrently using FI estimators (e.g., maximum likelihood [ML], Bayes). FI estimators produce consistent estimates of SEM parameters in large samples as long as data are missing at random (MAR). However, research has shown that in small-to-moderate sample studies (e.g., <200) and studies with a high model complexity to sample size ratio (e.g., <10 individuals/free parameter), such estimators often fail to converge and incur significant finite-sample bias (both in coefficients and standard errors). Research has further demonstrated that these issues are intensified by missing data because they conceptually reduce the effective sample size (e.g., Enders, 2010). When there are missing data, simple minimum sample size guidelines quickly become insufficient. For example, prior SEM research suggested that when only about 20% of the data cells are missing, the minimum sample size for even simple SEMs nearly doubles from 200 to 320 (Wolf et al., 2013). Yet, many studies in education draw on sophisticated theories involving multiple latent variables with samples less than 200 and/or incur missing data. Recent literature has developed SAM estimators that strategically separate estimation of structural and measurement components in ways that better support convergence and deliver less biased parameter estimates. For example, research has shown that SAM estimators perform well in a variety of SEM settings including small-to-moderate samples, multilevel contexts, with cross-classified and n-level structures, latent interactions, and with non-normal variables. Despite their promise, their use has been constrained because they do not accommodate missing data. In this study, we develop a SAM estimator that can handle missing data. Methods: Consider a simple mediation study that examines the extent to which the relationship between teacher knowledge (X) and teacher instruction (Y) is mediated by self-efficacy (M) (note that our results extend to a much richer set of SEMs and causal frameworks). We specify a common factor model for each latent variable noted above; e.g., for the outcome model we use [equation omitted] with y[subscript i] as the observed indicators of teacher i for the latent outcome, [eta][subscript y[subscript i]] as the latent variable, [lambda][subscript y] as the factor loadings, [mu][subscript y] as the intercept, and [epsilon][superscript y][subscript i] as the error terms. The structural model for the mediator is [equation omitted] with [eta][subscript M[subscript i]] and [eta][subscript X[subscript i]] as teacher-efficacy and knowledge for teacher i and a as its coefficient capturing the association between knowledge and efficacy. For the outcome path model, we use [equation omitted] with b capturing the conditional association between the latent efficacy and instruction and c' as the direct effect of knowledge on instruction. Proposed Estimator: In omitted derivations, we develop a SAM missing data estimator under the MAR mechanism. Conceptually, the standard SAM approach leverages measurement model information to correct factor score covariances for the ignored uncertainty and obtain consistent estimates of structural parameters. However, missing data prevents this approach because factor score prediction requires all indicator values to be observed. We derive an alternative approach that circumvents the use of factors scores. In a brief illustration, consider the estimation of the covariance between any two variables such as the mediator ([eta][subscript M]) and outcome ([eta][subscript Y]). In the standard SAM approach, we first estimate the measurement models for the mediator and the outcome. The resulting implied mediator-outcome covariance as a function of the implied factor scores ([symbols omitted]) is [equation omitted] where [lambda] is a factor loading matrix and A is a factor score prediction matrix such that [equation omitted] with C[superscript -1][subscript M] as the inverse of the mediator indicator covariance matrix and [sigma][superscript 2][[subscript eta][subscript M]] as the model-based variance of the mediator latent variable. The derivation suggests the mediator-outcome covariance can be obtained by correcting the factor score covariance (cov([eta with caret][subscript M], [eta with caret][subscript Y])) as [equation omitted]. Missing data prevents factor score prediction because it requires evaluating the product of the factor score matrix and each observed indicator value. However, further manipulation of expression (6) demonstrates an alternative approach that circumvents factor score prediction. Specifically, we replace the factor score covariances with [equation omitted]. This result suggests an alternative correction that avoids factor score prediction altogether is [equation omitted]. That is, the alternative correction only requires estimation of the covariance between the observed indicators of latent variables (cov(m,y)). This alternative correction proves to be useful because we can leverage FIML to estimate the covariance of indicators even when data are missing. Once the corrected covariances among the latent variables have been estimated using expression (8), we can obtain consistent estimates of the structural parameters using the usual SAM path analysis approach on sufficient statistics (covariances). Simulation: To provide an initial assessment of the proposed estimator, we generated data using Figure 1 and compared the proposed SAM estimator with three alternative estimators: (a) an uncorrected SAM estimator that simply used factor score regression (ignores uncertainty in measurement models); (b) a SAM approach that first obtained ML estimates of the covariances among observed indicators and then used these covariances to estimate all measurement and structural parameters concurrently with maximum likelihood; and (c) conventional FIML. The results are summarized in Table 1. Results underscored key advantages of the SAM estimator--it routinely outperforms conventional FI estimators in terms of convergence, bias and root-mean-square-error in small-to-moderate samples even with missing data. Significance: Literature has recognized that small/moderate scale studies offer critical contributions when well-executed. Yet, many studies in education that draw on sophisticated theories and SEM with missing data and small/moderate samples. This combination--small/moderate sample, missing data, sophisticated theories and SEMs--poses significant challenges for research because estimators typically demand large sample-to-parameter ratios. Our results coupled with those of other studies suggest that the SAM estimator serves as an important alternative.
Descriptors: Structural Equation Models, Research Problems, Error of Measurement, Maximum Likelihood Statistics, Bayesian Statistics, Sample Size, Educational Research, Pedagogical Content Knowledge, Teaching Methods, Self Efficacy, Attribution Theory, Evaluation Methods, Path Analysis, Comparative Analysis
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A