ERIC Number: ED536808
Record Type: Non-Journal
Publication Date: 2011
Pages: 149
Abstractor: As Provided
ISBN: ISBN-978-1-2670-8222-0
ISSN: N/A
EISSN: N/A
Available Date: N/A
Understanding and Mitigating Forum Spam
Shin, Youngsang
ProQuest LLC, Ph.D. Dissertation, Indiana University
The Web is large and expanding, making it challenging to attract new visitors to websites. Website operators often use Search Engine Optimization (SEO) techniques to boost the search engine rankings of their sites, thereby maximizing the inflow of visitors. Malicious operators take SEO to the extreme through many unsavory techniques that are often categorized as web spamming. A popular such technique is forum spamming, where miscreants put links to their websites on forums frequented by Internet users. This exploits the fact that forums, including webboards, blogs, wikis and guestbooks, allow their visitors to contribute content, and that high numbers of links from legitimate websites improve most search engine rankings. Unfortunately, the counter-measurements deployed by forum operators, such as account registration, email verification, and solving CAPTCHAs are not very effective against forum spamming since they can be effortlessly defeated by forum spam automating tools. In this dissertation, we investigate the various aspects of forum spam towards the goal of understanding and effectively mitigating it. We first study email spam (the most prevalent form of spam) and botnets (miscreants' enabler in much of cyberfraud) as background work. Then we examine the prevalence of forum spam on the Web. We also investigate how forum spammers operate by analyzing forum spam automators. Based on the resulting empirical understanding, we collect samples of forum spam and scrutinize them in order to identify distinctive features that can effectively identify it. Our approach consists of two mitigation methods. First, we analyze spammers' origin information and activity to find light-weight features for a classifier. Second, we classify URLs posted to forums as spam or legitimate by considering the link structure of the Web graph rooted at the posted URL. We show that support vector machine (SVM) classifiers based on our two methods can achieve a pragmatically high performance for forum spam detection. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]
Descriptors: Internet, Web Sites, Search Engines, Electronic Publishing, Electronic Mail, Discussion Groups, Incidence, Advertising, Classification
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A