The accessibility dimension for structured document retrieval
Roelleke, Thomas and Lalmas, Mounia and Kazai, Gabriella and Ruthven, Ian and Quicker, Stefan; Crestani, F. and Dunlop, M. and Mizzaro, S., eds. (2002) The accessibility dimension for structured document retrieval. In: Advances in Information Retrieval. Lecture Notes in Computer Science, 2291 . Springer, Germany, pp. 284-302. ISBN 978-3-540-43343-9 (https://doi.org/10.1007/3-540-45886-7)
Preview |
Text.
Filename: strathprints002463.pdf
Accepted Author Manuscript Download (298kB)| Preview |
Abstract
Structured document retrieval aims at retrieving the document components that best satisfy a query, instead of merely retrieving pre-defined document units. This paper reports on an investigation of a tf-idf-acc approach, where tf and idf are the classical term frequency and inverse document frequency, and acc, a new parameter called accessibility, that captures the structure of documents. The tf-idf-acc approach is defined using a probabilistic relational algebra. To investigate the retrieval quality and estimate the acc values, we developed a method that automatically constructs diverse test collections of structured documents from a standard test collection, with which experiments were carried out. The analysis of the experiments provides estimates of the acc values.
ORCID iDs
Roelleke, Thomas, Lalmas, Mounia, Kazai, Gabriella, Ruthven, Ian
-
-
Item type: Book Section ID code: 2463 Dates: DateEvent25 March 2002PublishedSubjects: Science > Mathematics > Electronic computers. Computer science Department: Faculty of Science > Computer and Information Sciences Depositing user: Strathprints Administrator Date deposited: 23 Jan 2007 Last modified: 29 Jan 2025 17:59 URI: https://strathprints.strath.ac.uk/id/eprint/2463