Thursday, 1:30–3:00 PM
Chair: Mark Manasse

Stochastic Models for Tabbed Browsing

Flavio Chierichetti, Ravi Kumar, Andrew Tomkins

We present a model of tabbed browsing that represents a hybrid between a Markov process capturing the graph of web pages, and a branching process capturing the creation, splitting, and dying of tabs. We present a mathematical criterion to characterize whether the process has a steady state independent of initial conditions, and we show how to characterize the limiting behavior in both cases. We perform a series of experiments to compare our tabbed browsing model with pagerank, and show that tabbed browsing is able to explain from 15-25% of the deviation between actual measured browsing behavior and the behavior predicted by the simple pagerank model. We find this to be a surprising result, as the tabbed browsing model does not make use of any notion of site popularity, but simply captures deviations in user likelihood to open and close tabs from a particular node in the graph.

A Characterization of Online Search Behavior

Ravi Kumar, Andrew Tomkins

In this paper, we undertake a large-scale study of online user behavior based on search and toolbar logs. We propose a new \emph{CCS taxonomy} of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (web search, item search, multimedia search). We show that roughly half of all pageviews online are content, 1/3 are communications, and the remaining 1/6 are search. We then study the extent to which pages of certain types are revisited by the same user over time, and the mechanisms by which users move from page to page, within and across hosts, and within and across page types. Finally, we characterize behavior within the three primary branches of our taxonomy.

Tracking the random surfer: Empirically measured teleportation parameters in PageRank

David Gleich, Paul Constantine, Abraham Flaxman, Asela Gunawardana

PageRank computes the importance of each page in a directed graph under a random surfer model governed by a teleportation parameter. Commonly denoted alpha, this parameter models the probability of following an edge inside the graph or, when the graph comes from a network of web pages and links, clicking a link on a web page. We empirically measure the teleportation parameter based on browser toolbar logs and a click trail analysis. For a particular user or machine, such analysis produces a value of alpha. We find that these values nicely fit a Beta distribution with mean edge-following probability between 0.3 and 0.7, depending on the site. Using these distributions, we compute PageRank scores where PageRank is computed with respect to a distribution as the teleportation parameter, rather than a constant teleportation parameter. These new metrics are evaluated on the graph of pages in Wikipedia.


Back to full list of papers