Robustness, Generalization and Fairness in Learning: Analysis and Design.

Zhun Deng

Machine learning has achieved state-of-the-art performance in many areas, including image recognition and natural language processing. However, there are still many challenges and mysteries attracting numerous researchers. This dissertation comprises a series of works concerning problems at the intersection of computer science theory, adversarial robustness, generalization theory, and social science. The first part is to understand adversarial robustness from two perspectives: the efficiency of popular defense mechanisms against adversarial attacks and to understand how adversarial training trade-offs prediction accuracy to robustness. Specifically, in the first perspective, we investigate the popular defense mechanism that formulates adversarial training as robust optimization and trains with projected gradient descent. We study the non-concave landscape of the adversarial loss of a two-layer neural network. Our main result proves that the projected gradient ascent finds a local maximum of this non-concave problem in a polynomial number of iterations with high probability. In the second perspective, we introduce the Adversarial Influence Function (AIF) as a tool to investigate the solution produced by robust optimization. The proposed AIF enjoys a closed-form and can be calculated efficiently and is useful in quantifying the trade-off between accuracy and robustness. The second part is devoted to understanding generalization in modern learning. We investigate how a popular data augmentation scheme helps generalization and also develop a new notion of stability tailored for modern machine learning. Specifically, for data augmentation, we provide theoretical analysis to demonstrate how using Mixup in training helps model robustness and generalization. For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss. This explains why models obtained by Mixup training exhibit robustness to several kinds of adversarial attacks such as the Fast Gradient Sign Method (FGSM). For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization that reduces overfitting. Our analysis provides new insights and a framework to understand Mixup. For a new type of stability, we propose "locally elastic stability" as a weaker and distribution-dependent stability notion, which still yields exponential generalization bounds. We further demonstrate that locally elastic stability implies tighter generalization bounds than those derived based on uniform stability in many situations by revisiting the examples of bounded support vector machines (SVM), regularized least square regressions, and stochastic gradient descent (SGD). The final part is about fairness, where we initiate the study of the construction of "scaffolding sets," a small collection "C" of sets with the property that multi-calibration with respect to "C" ensures recovering predictors of individual probabilities accurately, and not just calibration, of the predictor. Our approach is inspired by the folk wisdom that the intermediate layers of a neural net learn a highly structured and useful data representation. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]