Strathprints Home | Open Access | Browse | Search | User area | Copyright | Help | Library Home | SUPrimo

A multi-collection latent topic model for federated search

Baillie, M. and Carman, M. and Crestani, F. (2011) A multi-collection latent topic model for federated search. Information Retrieval, 14. pp. 390-412. ISSN 1386-4564

Full text not available in this repository. (Request a copy from the Strathclyde author)

Abstract

Collection selection is a crucial function, central to the effectiveness and efficiency of a federated information retrieval system. A variety of solutions have been proposed for collection selection adapting proven techniques used in centralised retrieval. This paper defines a new approach to collection selection that models the topical distribution in each collection. We describe an extended version of latent Dirichlet allocation that uses a hierarchical hyperprior to enable the different topical distributions found in each collection to be modelled. Under the model, resources are ranked based on the topical relationship between query and collection. By modelling collections in a low dimensional topic space, we can implicitly smooth their term-based characterisation with appropriate terms from topically related samples, thereby dealing with the problem of missing vocabulary within the samples. An important advantage of adopting this hierarchical model over current approaches is that the model generalises well to unseen documents given small samples of each collection. The latent structure of each collection can therefore be estimated well despite imperfect information for each collection such as sampled documents obtained through query-based sampling. Experiments demonstrate that this new, fully integrated topical model is more robust than current state of the art collection selection algorithms.

Item type: Article
ID code: 40772
Keywords: collection selection, information rtrieval, databases, distributed information retrieval, topic models, Electronic computers. Computer science, Library and Information Sciences, Information Systems
Subjects: Science > Mathematics > Electronic computers. Computer science
Department: Faculty of Science > Computer and Information Sciences
Related URLs:
Depositing user: Pure Administrator
Date Deposited: 08 Aug 2012 12:59
Last modified: 27 Mar 2014 10:26
URI: http://strathprints.strath.ac.uk/id/eprint/40772

Actions (login required)

View Item