NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
Peer reviewed Peer reviewed
Direct linkDirect link
ERIC Number: ED677765
Record Type: Non-Journal
Publication Date: 2025-Oct-10
Pages: N/A
Abstractor: As Provided
ISBN: N/A
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Optimal Design and Analysis for Equivalence Tests
Zuchao Shen
Society for Research on Educational Effectiveness
I. Purpose: There are instances where equivalences rather than differences are of interest (e.g., Lakens, 2017). For instance, researchers may be interested in testing whether two versions of interventions are equivalent or whether an alternative but more cost-efficient program can provide similar outcomes (Lakens et al., 2018; Schuirmann, 1987). The literature on design and analytical strategies for testing statistical equivalence has been limited. For instance, the prevalent equivalence test method has not been updated to include covariate adjustments, and the literature has not offered the correct statistical power formula for equivalence tests. The purpose of this study is to advance the study and analysis for equivalence tests. First, this study extends the prevalent equivalence test method to designs with covariate adjustments so that higher statistical power can be achieved. Second, this study develops and validates the closed-form statistical power formula for equivalence tests. Third, this study presents an optimal design framework for efficient sampling. Fourth, the methods and frameworks have been implemented in the R package "anomo" to improve accessibility. II. Relevant Literature: Equivalence Test: Suppose d[subscript 1] and d[subscript 2] are the average scores for groups 1 and 2. The null hypothesis for equivalence tests (Anderson & Maxwell, 2016; Bonett, 2021; Lakens et al., 2018; Schuirmann, 1987) is H[subscript 0]: [vertical bar]d[vertical bar] = [vertical bar]d[subscript 2] - d[subscript 1][vertical bar] [greater than or equal to] [delta], Here, [delta] is a small positive number so that [-[delta], [delta]] can serve as equivalence bounds. Schuirmann (1987) proposed the widely used two one-sided tests (TOST; Lakens et al., 2018) to test the null hypotheses of equivalence with two one-sided hypotheses. H[subscript 01]: d = d[subscript 2] - d[subscript 1] [less than or equal to] -[delta] and H[subscript 02]: d = d[subscript 2] - d[subscript 1] [greater than or equal to] [delta]. The test statistics for the TOST method (Lakens, 2017; Schuirmann, 1987) are [equations omitted]. Here, [d with circumflex subscript 1] and [d with circumflex subscript 2] are the estimated group means, and [sigma with circumflex subscript d with circumflex subscript 1] and [sigma with circumflex subscript d with circumflex subscript 2] are their standard errors. [sigma with circumflex superscript 2 subscript j] is the variance and n[subscript j] is the sample size for group j (j = 1, 2). t[subscript L] and t[subscript R] are the left- and right-side t-test statistics. We can reject the null hypothesis of equivalence if both p values are smaller than the significance level (Lakens et al., 2018; Schuirmann, 1987). Research Gaps: Covariate Adjustments. The TOST method (Schuirmann, 1987) was originally proposed to test the equivalence of two-group means without any covariate adjustment. Thus far, the prevalent TOST method implemented in software (e.g., R package TOSTER; Lakens, 2017) is the same as the originally proposed (Schuirmann, 1987). The planning strategies through statistical power analysis are for designs detecting equivalence without covariate adjustments (e.g., Anderson & Kelley, 2024; Guo et al., 2019). Covariate adjustment is one of the strategies offered by modern study design theory to improve statistical power in experimental settings and alleviate confounding issues through covariate adjustments in the non-experimental context (e.g., Hedges & Hedberg, 2007; Raudenbush, 1997; Imai et al., 2013; Shen et al., 2024). Statistical Power and Study Design. The literature has provided general guidance on 3 sample size planning based on formulas approximating power for equivalence tests (e.g., Guo et al., 2019; Shieh, 2016; Zhang, 2003). Most assume the estimated difference in means is zero, which may be false due to sampling error, and the no-difference-in-means assumption is more stringent than the null equivalence hypothesis. In addition, past power formula developments have not used the correct product function (see Section IV for more; Guo et al., 2019; Shieh, 2016; Zhang, 2003). III. Statistical Models: Because of space limitations, statistical models are presented in the Appendix. IV. Statistical Power and Optimal Design: Statistical Power: Both left- and right-side tests in the TOST method must be significant to claim a significant result. The TOST method is a type of joint significance test (Hayes & Scharkow, 2013; Shen et al., 2024): a compound test result is significant only if all individual tests are significant. Like the joint significance test, the power for equivalence tests should be the product of the power for each test. Statistical power for equivalence tests is the probability of rejecting the null hypothesis when the alternative hypothesis is true (e.g., [vertical bar]d[vertical bar] < [delta]). The power for the equivalence test (P) is the product of the power for the left-side test (P[subscript L]) and the power for the right-side test (P[subscript R]). P = P[subscript L]P[subscript R] = 1- H[t(1-[alpha],df),df,[lambda subscript L]](1-[t(1-[alpha],df),df,[lambda subscript R]]). Where t(1-[alpha],df) is the 100(1-[alpha])% critical value in a noncentral t-distribution with df degrees of freedom, and H(t, df, [lambda]) is the cumulative density function of the noncentral t-distribution with df degrees of freedom and the noncentrality parameter [lambda]. The degree of freedom is df = n-q-2, with q as the number of predictors (other than the group status indicator) in the combined model. The non-centrality parameters are [equation omitted] and [equation omitted]. Optimal Sample Allocation: Suppose the respective costs of enrolling one individual in different groups (e.g., control and treated groups) are c[subscript 1] and c[subscript 1 superscript T]. Maximizing statistical power under a fixed budget is approximately minimizing the variance of the difference estimator. We have [equation omitted]. This is the same optimal allocation as that for randomized controlled trials detecting whether the two-group means are different (e.g., Liu, 2003; Nam, 1973; Shen & Kelcey, 2020; Shen et al., 2024). V. Validation Results: Table 1 presents the parameter values used. We generated 1,000 data sets for each condition. The results show that the updated TOST method and the original TOST method have the same performances for designs without covariates (Figure 1a). The updated TOST method offers more power than the original TOST method, which does not use covariates (Figure 1b). Figure 2 shows that the power formulas are accurate enough to help design adequately powered studies targeting equivalence tests. VI. Illustration: Figure 3 shows that when a design departs from the optimal allocation (p = 0.24) in either direction, it suffers from statistical power reduction under the same budget. VII. Significance: Equivalence testing can formally test the equivalence of outcomes, which may be key equity issues in policy implementation. In addition, equivalence tests can be used to identify more efficient interventions that provide comparable outcomes. The proposed framework and tools can significantly advance the study design and analysis for equivalence tests, furthering educational practices more efficiently and equitably.
Society for Research on Educational Effectiveness. 2040 Sheridan Road, Evanston, IL 60208. Tel: 202-495-0920; e-mail: contact@sree.org; Web site: https://www.sree.org/
Publication Type: Reports - Research
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: Society for Research on Educational Effectiveness (SREE)
Grant or Contract Numbers: N/A
Author Affiliations: N/A