Thursday, 10:30 AM – 12:00 PM
Chairs: Vanja Josifovski

Actively Predicting Diverse Search Intent from User Browsing Behaviors

Zhicong Cheng, Bin Gao, Tie-Yan Liu

This paper is concerned with actively predicting search intent from user browsing behavior data. In recent years, great attention has been paid to predicting user search intent. However, the prediction was mostly passive because it was performed only after users submitted their queries to search engines. It is not considered why users issued these queries, and what triggered their information needs. According to our study, many information needs of users were actually triggered by what they have browsed. That is, after reading a page, if a user found something interesting or unclear, he/she might have the intent to obtain further information and accordingly formulate a search query. Actively predicting such search intent can benefit both search engines and their users. In this paper, we propose a series of technologies to fulfill this task. First, we extract all the queries that users issued after reading a given page from user browsing behavior data. Second, we learn a model to effectively rank these queries according to their likelihoods of being triggered by the page. Third, since search intents can be quite diverse even if triggered by the same page, we propose an optimization algorithm to diversify the ranked list of queries obtained in the second step, and then suggest the list to users. We have tested our approach on large-scale user browsing behavior data obtained from a commercial search engine. The experimental results have shown that our approach can predict meaningful queries for a given page, and the search performance for these queries can be significantly improved by using the triggering page as contextual information.

Exploiting Query Reformulations for Web Search Result Diversification

Rodrygo Santos, Craig Macdonald, Iadh Ounis

When a user’s underlying information need cannot be unambiguously determined from an initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for search result diversification, which explicitly accounts for the various aspects associated to a query. In particular, we estimate how well a given document satisfies each uncovered aspect and how well different aspects are satisfied by the result ranking as a whole, so as to effectively and efficiently achieve the desired objective of diversification. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effectiveness of our framework when compared to state-of-the-art diversification approaches. Additionally, by simulating an upper-bound query suggestion mechanism from official TREC data, we draw insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.

Diversifying Web Search Results

Davood Rafiei, Krishna Bharat, Anand Shukla

Result diversity is a topic of great importance as more facets of queries are discovered and users expect to find their desired facets in the first page of the results. However, the underlying questions of how “diversity” interplays with “quality” and when preference should be given to one or both are not well-understood. In this work, we model the problem as expectation maximization and study the challenges of estimating the model parameters and reaching an equilibrium. One model parameter, for example, is correlations between pages which we estimate using textual contents of pages and click data (when available). We conduct experiments on diversifying randomly selected queries from a query log and the queries chosen from the disambiguation topics of Wikipedia. Our algorithm improves upon Google in terms of the diversity of random queries, retrieving 14% to 38% more aspects of queries in top 5, while maintaining a precision very close to Google. On a more selective set of queries that are expected to benefit from diversification, our algorithm improves upon Google in terms of precision and diversity of the results, and significantly outperforms another baseline system for result diversification.


Back to full list of papers