AI in Education: Effective Machine Learning Methods to Improve Data Scarcity and Knowledge Generalization.

Jia Tracy Shen

In education, machine learning (ML), especially deep learning (DL) in recent years, has been extensively used to improve both teaching and learning. Despite the rapid advancement of ML and its application in education, a few challenges remain to be addressed. In this thesis, in particular, we focus on two such challenges: (i) data scarcity and (ii) knowledge generalization. First, given the privacy concerns of students or students' behavior differences, it is common to have missing data in the education domain, which challenges the application of ML methods. Second, due to varying data distributions across education platforms and applications, ML methods often struggle to generalize well over unseen data sets. Therefore, this thesis proposes effective "data augmentation" methods to combat the challenge (i) and investigate "transfer learning" techniques to solve the challenge (ii). More specifically for the challenge (i), we provide simple to complex solutions to augment data by: (a) optimizing statistical time lag selection to reduce matrix sparsity and improve original model performance by 32% in classification tasks and 12% in regression tasks; and (b) developing deep generative models (i.e., LSTM-[L]VAEs) to generate missing data to improve original model performance by 50%. For the challenge (ii), we employ transfer learning to improve model generalization and enable knowledge transfer from other domains to the education domain in three approaches: (1) a comparison approach; (2) a TAPT (Task Adapted Pre-train) approach; (3) a DAPT (Domain Adapted Pre-train) approach. Approach (1) first demonstrates the effectiveness of the transfer learning and then compares the transferability saliency between different models (i.e., LSTM vs. AdaRNN vs. Transformer) and transfer learning methods (i.e., feature-based vs. instance-based). It discovers that the Transformer model is 3-4 times more effective than other model structures and feature-based method is up to 5 times superior to its counterpart in transferability. Furthermore, Approach (2) leverages the shared semantic and lexical extractions from the pre-trained general language model and forms a TAPT BERT model to adapt to the particular education tasks. It surpasses the original general language model by 2%. Finally, Approach (3) further trains on the general language model but adapts to a large mathematical corpus to form a DAPT model. It is demonstrated to improve prior state-of-the-art models and BASE BERT by up to 22% and 8%, respectively. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]