NotesFAQContact Us
Collection
Advanced
Search Tips
Back to results
ERIC Number: ED667514
Record Type: Non-Journal
Publication Date: 2020
Pages: 106
Abstractor: As Provided
ISBN: 979-8-5229-7038-3
ISSN: N/A
EISSN: N/A
Available Date: 0000-00-00
Enhanced Random Walk Sampling over Online Networks Leveraging Topology Knowledge
Naian Yin
ProQuest LLC, Ph.D. Dissertation, The George Washington University
Online networks, as digital images reflecting every aspects of real world especially under the current context of information technology, have become the most massive and opulent data source these days, where vast amount of information keeps being produced and transmitted. Their rapid expansion in scale and data volume draws wide attention and interests from researchers, while challenges emerge along as well. The inherent growing demands of data fetching required by analysis conflict greatly with ON providers' efforts to protect their digital assets and considerations of user privacy. Restrictions of the web and other application interfaces of ONs prevent third party researchers from gathering sufficient data from the networks and the global image of these networks are also hidden. Under such circumstances, very few existing analytic tools but the random walk family techniques can be adopted to fulfill ON related data tasks. They are capable of sampling network data distributed over desired distributions without knowledge of the entire graphs. However, their possibly intolerable and hard-to-foresee long 'burn-in' (or mixing time) before convergences also bring cost dilemmas that need to be conquered. Reviewing existing sampling techniques being used over ONs, most of them are more generalized approaches that can be used on arbitrary networks, which inspired my work to find the possibility of speeding up random walks by leveraging some well established ON topology features. This dissertation includes my discussions of several speeded random walk like approaches standing on common characteristics of ONs. Firstly, online networks are believed to be scale-free and follow power law in their degree distributions. It leads to the presence of very high degree nodes or "hubs" that a relatively small number of them can access nearly the whole network. The first idea is to perform random walks in the smaller sub-networks consisting of hubs. This reduces the size of ON related problems and gives the walks quicker access to larger portion of non-hub nodes through hubs, seeking for faster mixing and exploration over networks. And the idea of walking over hub nodes also carries potential to solve basic graph problems in topology ambiguous online networks context, where I think of the single pair path problem. It is one of those most basic element in graph analysis used to answer or estimate some important graph properties, but with topology unavailable it's lacking efficient approach to meet the growing demand. I take advantage of the presence of hub nodes mentioned above to boost the path searching speed, either in pure greedy approach or optimal guaranteed A* manner, and thus try to avoid the expensive query cost using existing methods like breadth first search. Then I considered the very common community structure showed up in ONs (also known as clusters or modules). These are groups of nodes that share partially identical properties or serve similar roles within the networks, where existing random walks can easily get stuck in place because of low inter community connectivity. My idea is to leverage community information that pushes (by raising the probability of crossing edges) random walks to go across the "bottle neck" between different communities. All ideas aim to fast raise the probability of exploring nodes from entire network and avoid wandering locally for too long. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml.]
ProQuest LLC. 789 East Eisenhower Parkway, P.O. Box 1346, Ann Arbor, MI 48106. Tel: 800-521-0600; Web site: http://www.proquest.com.bibliotheek.ehb.be/en-US/products/dissertations/individuals.shtml
Publication Type: Dissertations/Theses - Doctoral Dissertations
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Grant or Contract Numbers: N/A
Author Affiliations: N/A