Strathprints logo
Strathprints Home | Open Access | Browse | Search | User area | Copyright | Help | Library Home | SUPrimo

Adaptive query-based sampling for distributed IR

Azzopardi, L. and Baillie, M. and Crestani, F. (2006) Adaptive query-based sampling for distributed IR. In: 29th Annual ACM Conference on Research and Development in Information Retrieval, 2006-08-06 - 2006-08-11, Seattle.

[img]
Preview
PDF (strathprints002780.pdf)
Download (114Kb) | Preview

    Abstract

    In Distributed Information Retrieval systems (DIR), the widely accepted solution for resource description acquisition is Query-Based Sampling (QBS) [1]. In the standard approach to QBS, once 300-500 unique documents have been retrieved sampling is curtailed. This threshold was obtained by empirically measuring the estimated resource description against the actual resource, and then considering the corresponding retrieval selection accuracy [1]. However, a fixed threshold may not generalise to other collections and environments beyond that which it was estimated on (i.e. a set of resources of uniform size [1]). Cases when the blanket application of such a heuristic would be inappropriate include (1) when the sizes of resource are highly skewed and (2) when the resources are very heterogenous. In the former, if a resource is very large then undersampling will occur because not enough documents were obtained. Conversely, if a collection is very small in size, then oversampling will occur increasing costs beyond necessity. In the later case, if the resource is varied and highly heterogeneous, then to obtain a sufficiently accurate description would require more documents to be sampled than when resources are homogenous. Either way, adopting a flat cut off will not necessarily provide sufficiently good resource descriptions for all resources.

    Item type: Conference or Workshop Item (Paper)
    ID code: 2780
    Keywords: distributed information retrieval, query-based sampling, selection accuracy, search algorithm, Electronic computers. Computer science
    Subjects: Science > Mathematics > Electronic computers. Computer science
    Department: Faculty of Science > Computer and Information Sciences
    Related URLs:
    Depositing user: Strathprints Administrator
    Date Deposited: 05 Apr 2007
    Last modified: 19 Jul 2013 22:09
    URI: http://strathprints.strath.ac.uk/id/eprint/2780

    Actions (login required)

    View Item

    Fulltext Downloads: