NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED578135
Record Type: Non-Journal
Publication Date: 2017
Pages: 115
Abstractor: As Provided
ISBN: 978-0-3551-6209-7
ISSN: EISSN-
EISSN: N/A
Available Date: N/A
Statistical Models for Linguistic Variation in Online Media
Kulkarni, Vivek
ProQuest LLC, Ph.D. Dissertation, State University of New York at Stony Brook
Language on the Internet and social media varies due to time, geography, and social factors. For example, consider an online chat forum where people from different regions across the world interact. In such scenarios, it is important to track and detect regional variation in language. A person from the UK, who is in conversation with someone from the USA could say "he is stuck in the lift" to mean "he is stuck in an elevator", since the word lift means an elevator in the UK. Note that in the US, lift does not refer to an elevator. Modeling such variation can allow for applications to prompt or suggest the intended meaning to the other participants of the conversation. In this thesis, we conduct two related lines of inquiry focusing on (a) language itself and the variation it manifests and (b) the user and what we can infer about them based on their language use on social media. First, we develop computational methods to track and detect changes in word usage, including semantic and syntactic variation. We examine three modalities: time, geography and domains. Specifically, we outline methods to use distributional word representations (word embeddings) to detect semantic variation in word usage. Our methods are scalable to large datasets, making them particularly suited for social media. Second, we turn our attention towards users. In particular, we model latent traits of users based on their everyday language use on social media. We develop latent factor models, that explicitly seek to build representations of each user based on their inferred latent traits. These models capture latent traits that serve as useful co-variates for a wide variety of tasks like predicting what topics users like on social media and the number of friends in their social circle. This work has broad applications in several fields like information retrieval, semantic web applications, socio-variational linguistics, and computational social science including digital health care and ad-targeting. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A