The accessibility dimension for structured document retrieval

Roelleke, Thomas and Lalmas, Mounia and Kazai, Gabriella and Ruthven, Ian and Quicker, Stefan; Crestani, F. and Dunlop, M. and Mizzaro, S., eds. (2002) The accessibility dimension for structured document retrieval. In: Advances in Information Retrieval. Lecture Notes in Computer Science, 2291 . Springer, Germany, pp. 284-302. ISBN 978-3-540-43343-9 (https://doi.org/10.1007/3-540-45886-7)

[thumbnail of strathprints002463]
Preview
Text. Filename: strathprints002463.pdf
Accepted Author Manuscript

Download (298kB)| Preview

Abstract

Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.