Indexing without spam
Zuccon, Guido and Leelanupab, Teerapong and Nguyen, Anthony and Azzopardi, Leif (2011) Indexing without spam. In: ADCS 2011 - Proceedings of the Sixteenth Australasian Document Computing Symposium. RMIT University, Melbourne, Vic., pp. 6-13. ISBN 9781921426926
|
Text (Zuccon-etal-ADCS2011-Indexing-without-spam)
Zuccon_etal_ADCS2011_Indexing_without_spam.pdf Accepted Author Manuscript Download (362kB)| Preview |
Abstract
The presence of spam in a document ranking is a major issue for Web search engines. Common approaches that cope with spam remove from the document rankings those pages that are likely to contain spam. These approaches are implemented as post-retrieval processes, that filter out spam pages only after documents have been retrieved with respect to a user's query. In this paper we propose removing spam pages at indexing time, therefore obtaining a pruned index that is virtually "spam-free". We investigate the benefits of this approach from three points of view: indexing time, index size, and retrieval performance. Not surprisingly, we found that the strategy decreases both the time required by the indexing process and the space required for storing the index. Surprisingly instead, we found that by considering a spam-pruned version of a collection's index, no difference in retrieval performance is found when compared to that obtained by traditional post-retrieval spam filtering approaches.
Author(s): | Zuccon, Guido, Leelanupab, Teerapong, Nguyen, Anthony and Azzopardi, Leif | Item type: | Book Section |
---|---|
ID code: | 66533 |
Keywords: | efficiency, index pruning, information retrieval, spam, web search, indexing (of information), search engines, document ranking, Electronic computers. Computer science, Computer Graphics and Computer-Aided Design, Computer Science Applications, Software |
Subjects: | Science > Mathematics > Electronic computers. Computer science |
Department: | Faculty of Science > Computer and Information Sciences |
Depositing user: | Pure Administrator |
Date deposited: | 10 Jan 2019 15:28 |
Last modified: | 05 Jun 2019 04:33 |
Related URLs: | |
URI: | https://strathprints.strath.ac.uk/id/eprint/66533 |
Export data: |