Artificial Neural Networks as Models of Human Language Acquisition.

Alex Warstadt

Data-driven learning uncontroversially plays a role in human language acquisition--how large a role is a matter of much debate. The success of artificial neural networks in NLP in recent years calls for a re-evaluation of our understanding of the possibilities for learning grammar from data alone. This dissertation argues the case for using artificial neural networks to test hypotheses about human language acquisition and presents progress towards this goal from multiple directions. Compared to experiments on human subjects, experiments on artificial learners based on neural networks have massive advantages in terms of ethics, expense, and expanded possibilities for experimental design. I provide a general recipe for building more convincing model learners and designing ablation experiments using them (Chapter 1). Subsequently, I introduce benchmarks including The Corpus of Linguistic Acceptability (CoLA; Chapter 2) and The Benchmark of Linguistic Minimal Pairs for English (BLiMP; Chapter 3) that use acceptability judgments to probe grammatical knowledge in artificial neural networks. Although off-the-shelf neural language models popular in natural language processing today achieve human-level performance on these benchmarks, they are trained on orders of magnitude more linguistic input than children are exposed to, making them unsuitable for studying human language acquisition. Thus, I train language models from scratch in more human-like settings using limited data, and track the acquisition of language-specific inductive bias and grammatical features as a function of the volume of input (Chapters 4 and 5). Results show that there is a remarkably rich signal for grammar learning in raw text data, but current models require considerably more data than a child to learn from it. Chapter 6 is a synthesis of these approaches applied to a long-standing debate in language acquisition regarding the learnability of structure dependence in subject auxiliary inversion. Through a controlled experimental manipulation of the learning environment of neural language models, I find evidence that in the absence of an innate hierarchical bias, direct evidence against a linear rule, while helpful, is not necessary in order for data-driven learners to learn acceptability judgments consistent with a structural generalization. These results not only highlight the importance of considering indirect evidence in learnability debates, but provide a proof-of-concept for the use artificial learners in evaluating previously untestable hypotheses about human language acquisition. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]