ERIC Number: ED578356
Record Type: Non-Journal
Publication Date: 2017
Pages: 139
Abstractor: As Provided
ISBN: 978-0-3551-5212-8
ISSN: EISSN-
EISSN: N/A
Available Date: N/A
Using Syntactic Patterns to Enhance Text Analytics
Meyer, Bradley B.
ProQuest LLC, Ph.D. Dissertation, North Carolina Agricultural and Technical State University
Large scale product and service reviews proliferate and are commonly found across the web. The ability to harvest, digest and analyze a large corpus of reviews from online websites is still however a difficult problem. This problem is referred to as "opinion mining." Opinion mining is an important area of research as advances in the field enable consumers and business to make better informed decisions from others experiences. Much of the research in opinion mining relies upon the Bag-Of-Words assumption which yields computationally tractable methods. The BOW assumption disregards language constructs considering each word to be independent. My research does not follow the often used BOW model rather it diverges by examining recurring patterns found in written languages. This dissertation attempts to answer the question, "Can opinion mining tasks benefit by using syntactic patterns in text?" I answer this question by injecting information gained from decomposing and examining syntactic patterns into aspect extraction and sentence level sentiment analysis methods and performing experiments accordingly. I propose a variant of Latent Dirichlet Allocation (LDA) model referred to as the LDA-POS model. The LDA-POS model examines short range syntactic dependencies by conditioning the word assignment to the topic on both the previous word and the previous word's part-of-speech (POS). I also experiment with a LDA-POS model which filters the word assignment if the previous word emotes low sentiment. Using these models and two comparative models I perform aspect extraction experiments on a large corpus of hotel reviews. My results find that the models which include additional information from syntactic patterns or sentiment signals typically outperform models which do not include this information. Aspect extraction is naturally complemented with sentiment analysis allowing the researcher to understand the reviewers bias toward the aspect. The adjective-noun pair is a standard document level sentiment analysis method. Sentiment analysis however can be examined at different levels of granularity, such as document, paragraph or sentence. It is at the sentence level where this method can fail. I propose a machine learning sentence level sentiment classification technique which uses features constructed from syntactic sentence patterns. My experiments on a hotel reviews dataset have shown the efficacy of these methods verses lexicon BOW methods. I also find that these methods work well across domains a common weakness of lexicon BOW sentiment analysis methods. Lastly I demonstrate how contemporary network algorithms which focus solely on the topological structure of nodes and edges can be extended to incorporate additional information allowing the researcher to study node-centric problems. I illustrate this by presenting the "Hotel Reviewers Problem" which requires the fusing of aspects and sentiment. To solve this problem I propose a novel graph clustering algorithm which efficiently identifies communities of hotel reviewers which hold similar sentiments toward hotel aspects. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]
Descriptors: Syntax, Language Research, Language Patterns, Web Sites, Computational Linguistics, Data Analysis, Opinions, Models, Comparative Analysis, Housing, Tourism, Evaluation
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A