Accounting for Intersectional Social Identities: Exploring the Statistical Constraints of Models.

Olivia Szendey

Background & Context: Intersectionality theory posits that each individual's unique intersection of identity is integral to understanding their lived experience. Intersectionality theory also understands that social positions interact with oppression to influence an individual's lived experience (Bowleg, 2012; Collins, 2007; Crenshaw, 1989). Intersectionality theory garners increased attention from researchers interested in understanding the many ways in which oppression impacts lived experiences. In any given present and evolving context, oppression leads to advantages for some social positions and disadvantages for others (Collins & Bilge, 2016; Crenshaw, 1989). Quantitative researchers have attempted to adapt statistical modeling methods to reflect intersectional identities as a proxy for oppression and advantage in their models (Bauer et al., 2021; Schudde, 2018). Researcher's understanding of how the choice of a quantitative modeling approach influences one's ability to account for intersectionality is still emerging. In addition, this is further complicated because education researchers often work with data that has complex demographic characteristics, such as uneven proportions within categories and varying amounts of within-group variance. This study simulated clustered educational datasets to examine how three methods of modeling intersectional identities perform under various demographic data scenarios. Purpose and Research Questions: This research expanded on existing knowledge about the statistical limitations of three methods of modeling intersectional analyses on a continuous outcome variable in a clustered context: (1) Interaction, (2) Categorical, and (3) MAIDHA (multilevel analysis of individual heterogeneity and individual accuracy). The fundamental questions that guided this research were: (1) What are the statistical advantages and disadvantages of each model under different demographic data characteristics?; (2) In what ways does each model perform differently from one another under each demographic data characteristic condition?; and (3) What is the theoretical fit of each model to intersectionality, and what are appropriate uses? To answer these research questions, I simulated datasets to understand how four demographic characteristics influence the technical quality of the estimates provided by each model. The characteristics of focus include a) the number of demographic categories (and thus intersections); b) the proportion of the sample represented by each demographic group; c) the within-intersectional-group variance in the outcome variable of interest; d) overall sample size to create realistic scenarios education researchers encounter when working with demographic data. Methods: A series of Monte Carlo simulations were conducted to generate data sets that were used to compare the performance of each method of modeling intersectional analyses under different conditions, each of which was designed to mimic the complexity of demographic data in education. The three methods examined were the interaction model, the categorical model, and the MAIDHA model, figures 1-3. Scenarios were created based on a combination of each of four demographic data characteristics. There were 54 unique scenarios that were defined by a combination of four demographic data characteristics, described in table 1. Across all repetitions, this study simulated a hierarchical data context where students were evenly clustered in groups designed to represent schools. For each of the 54 different scenarios, 1000 datasets were generated based on true analytic parameters for each intersectional group, yielding a total of 54,000 datasets. For each dataset, three analytic methods were applied. This yielded a total of 162 combinations of methods and scenarios, and thus a total of 162,000 models were built. For each of the 162 combinations of methods and scenarios, five simulation outcomes were estimated: bias, accuracy (mean square error), type 1 error, power, and coverage. Model fit through Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC) was also retained. Result: The results of this study were complex, but there were some scenarios in which researchers with similar data structures may apply one or more of the methods of modeling intersectional analyses with less limitations. High-level results are presented in tables 3, 4, and 5. In general, most models performed best with small within-group standard deviation and a lower number of demographic categories. In practice researchers will have different configurations of the number of demographic categories and the number of identity indicators within each demographic category. Therefore, it may be helpful to consider this with respect to the overall number of intersectional groups. In my two-category scenarios there were 14 intersectional groups, and in my three-category scenarios there were 42 intersectional groups. The interaction model is feasible to use with small standard deviations when there are up to 14 intersectional groups. However, researchers need to be weary of the risks of type 1 error. The categorical model is feasible for use with up to 42 intersectional groups with small within-group standard deviation. However, the categorical model also presented inflated type 1 error in those conditions. Finally, MAIDHA is recommended for use in up to 14 intersectional groups with small within-group standard deviation, but researchers need to be wary of bias. Significance: As the use of quantitative methods in education to examine relationships between intersectional identities and outcomes of interest expands, researchers are deepening their understanding of the socio-historical forces of racism and oppression in education (e.g., Covarrubias et al., 2018; Jang, 2019; López et al., 2018; Nissen et al., 2021). However, the statistical models typically used by educational researchers today were not designed with intersectionality in mind (Bowleg, 2008). Recognizing the factors specific to intersectionality theory that impact the technical characteristics of results produced by various statistical modeling techniques is needed to help inform the advancement of social justice goals in education. The findings of this study expanded our current understanding of how demographic data characteristics influence the statistical parameters of method of modeling intersectional analyses. Until now, there has been no knowledge base regarding how the technical properties of estimated coefficients are impacted by and vary across methods of developing an intersectional model. This research study opens more doors for questions than answers it provides, and it serves as a starting point for the continued study of intersectionality within quantitative methods.