Picture of classic books on shelf

Literary linguistics: Open Access research in English language

Strathprints makes available Open Access scholarly outputs by English Studies at Strathclyde. Particular research specialisms include literary linguistics, the study of literary texts using techniques drawn from linguistics and cognitive science.

The team also demonstrates research expertise in Renaissance studies, researching Renaissance literature, the history of ideas and language and cultural history. English hosts the Centre for Literature, Culture & Place which explores literature and its relationships with geography, space, landscape, travel, architecture, and the environment.

Explore all Strathclyde Open Access research...

Revisiting the relationship between document length and relevance

Losada, David E. and Azzopardi, Leif and Baillie, Mark (2008) Revisiting the relationship between document length and relevance. In: CIKM '08 Proceedings of the 17th ACM Conference on Information and Knowledge Management. ACM, New York, NY, USA, pp. 419-428. ISBN 978-1-59593-991-3

Full text not available in this repository. Request a copy from the Strathclyde author

Abstract

The scope hypothesis in Information Retrieval (IR) states that a relationship exists between document length and relevance, such that the likelihood of relevance increases with document length. A number of empirical studies have provided statistical evidence supporting the scope hypothesis. However, these studies make the implicit assumption that modern test collections are complete (i.e. all documents are assessed for relevance). As a consequence the observed evidence is misleading. In this paper we perform a deeper analysis of document length and relevance taking into account that test collections are incomplete. We first demonstrate that previous evidence supporting the scope hypothesis was an artefact of the test collection, where there is a bias towards longer documents in the pooling process. We evaluate whether this length bias affects system comparison when using incomplete test collections. The results indicate that test collections are problematic when considering MAP as a measure of effectiveness but are relatively robust when using bpref. The implications of the study indicate that retrieval models should not be tuned to favour longer documents, and that designers of new test collections should take measures against length bias during the pooling process in order to create more reliable and robust test collections.