Friday, 1:30–3:00 PM
Chair: Andreas Hotho

Do You Want to Take Notes? Identifying Research Missions in Yahoo! Search Pad

Debora Donato, Francesco Bonchi, Tom Chi, Yoelle Maarek

Addressing user’s information needs has been one of the main goals of Web search engines since the early days of the Web. Information needs are expressed as a free-text query that the search engine processes in order to return a list of candidate results. In some cases, users cannot get a definitive answer from these results, simply because their need is too complex and heterogeneous to be answered by a single Web page. This typically happens when users investigate a certain topic in domains such as education, travel or health for instance. We refer to this type of activities as “research missions”, and we demonstrate in this paper that such missions can be automatically identified on-the-fly, meanwhile the user is interacting with the search engine, through careful analysis of query flows and query sessions. We argue that changing the level of granularity of query modeling, from an isolated query to a list of queries pertaining to the same task, so as to better reflect a certain type of information needs, can be beneficial in a number of Web search applications. We substantiate our claim by showing how research missions brought value to a novel Yahoo! application, called Search Pad, that was launched this year on Yahoo! Search.

How Useful Are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings

Stefan Siersdorfer, Jose San Pedro, Sergiu Chelaru, Wolfgang Nejdl

An analysis of the leading social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments from about 67,000 YouTube videos for which we analyzed various dependencies between comments, views, comment ratings and topic categories. Furthermore, we also study the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we build different classifiers for the prediction of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.

Stop Thinking, Start Tagging: Tag Semantics Arise from Collaborative Verbosity

Christian Körner, Dominik Benz, Markus Strohmaier, Andreas Hotho, Gerd Stumme

Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence the evolution of semantic structures in these systems. A natural hypothesis is that the quality of the emergent semantics depends on the pragmatics of tagging: Users with certain usage patterns might contribute more to the resulting semantics than others. In this work, we propose several measures which enable a pragmatic differentiation of taggers by their degree of contribution to emerging semantic structures. We distinguish between categorizers, who typically use a small set of tags as a replacement for hierarchical classification schemes, and describers, who are annotating resources with a wealth of freely associated, descriptive keywords. To study our hypothesis, we apply semantic similarity measures to 64 different partitions of real-world and large-scale folksonomy containing different ratios of categorizers and describers. Our results not only show that ‘verbose’ taggers are most useful for the emergence of tag semantics, but also that a subset containing only 40% of the most ‘verbose’ taggers can produce results that match and even outperform the semantic precision obtained from the whole dataset. Moreover, the results suggest that there exists a causal link between the pragmatics of tagging and resulting emergent semantics. This work is relevant for designers and analysts of tagging systems interested (i) in fostering the semantic development of their platforms, (ii) in identifying users introducing “semantic noise”, and (iii) in learning ontologies.


Back to full list of papers