Friday, 10:30 AM – 12:00 PM
Chair: Jennifer Neville

Sampling Community Structure

Arun Maiya, Tanya Berger-Wolf

We propose a novel method, based on concepts from expander graphs, to sample communities in networks. We show that our sampling method, unlike previous techniques, produces subgraphs representative of community structure in the original graph. Using samples produced by our method, we show that the problem of community detection may be recast into a case of statistical relational learning. We empirically evaluate our approach against several real-world datasets and demonstrate that our sampling method can effectively be used to infer and approximate community affiliation in the larger network. To the best of our knowledge, this is the first work using sampling to make non-trivial inferences of latent properties such as community structure in the larger graph.

Using a Model of Social Dynamics to Predict Popularity of News

Kristina Lerman, Tad Hogg

Popularity in social media is unequally distributed with few of the items, such as blog posts, stories on the social news portal Digg, images on the photo-sharing site Flickr, etc., grabbing a disproportionate number of votes or views. Predicting which items will become popular is critically important for both social media companies that host user-generated content and consumers of that content. Accurate and timely prediction would enable the companies to maximize revenue through differential pricing for access to content or ad placement. Prediction would give consumers an important tool for dealing with the ever-growing amount of content. Predicting popularity of content in social media, however, is a challenging problem. In a landmark study, Salganik et al. showed that social influence, or the ability to see the choices of others, causes popularity to be both unequally distributed and unpredictable. We argue that observability of user behavior on many social media sites allows us to construct a model of social dynamics of users on the site. The model can in turn be used to predict long-term popularity of content based on how its early popularity changes in time. We validate this claim on the social news portal Digg. We use the model of social voting developed in earlier works to predict how many votes stories will receive on Digg based on the users’ early reaction to them.

Measurement-calibrated Graph Models for Social Network Experiments

Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, Ben Zhao

Access to realistic, complex graph datasets is critical to research on social networking systems and applications. Simulations on graph data provide critical evaluation of new systems and applications ranging from community detection to spam filtering and social web search. Since gathering real graph datasets through direct measurements is costly in time and resources, researchers are anonymizing and distributing valuable datasets to the community. However, performing experiments using shared real datasets faces three significant disadvantages: concerns that graphs can be de-anonymized to reveal private information, increasing costs of distributing large graph datasets, and the fact that the small number of available social graphs limits the statistical confidence in the results. The use of trace-driven graph models is an attractive alternative to sharing datasets. Researchers can “fit” a graph model to a real social graph, extract a set of model parameters, and use them to generate multiple synthetic graphs statistically similar to the original graph. While numerous graph models have been proposed, it is unclear if they can produce synthetic graphs that accurately match the properties of the original graphs. In this paper, we explore the feasibility of trace-driven synthetic graphs using six popular graph models and real social graphs measured from the Facebook social network ranging from 30,000 to 3 million edges. We find that two models consistently produce synthetic graphs with common graph metric values similar to those of the original graphs. However, only one of these models produces high fidelity results in our application-level benchmarks. While this shows that graph models can produce realistic synthetic graphs, it also highlights the fact that current graph metrics remain incomplete, and some applications expose graph properties that do not map to existing metrics.


Back to full list of papers