Measuring the retrievability of digital library content using analytics data

Jahani, Hamed and Azzopardi, Leif and Sanderson, Mark (2024) Measuring the retrievability of digital library content using analytics data. Journal of the Association for Information Science and Technology. ISSN 2330-1635 (https://doi.org/10.1002/asi.24886)

[thumbnail of Asso for Info Science Tech - 2024 - Jahani - Measuring the retrievability of digital library content using analytics data]
Preview
Text. Filename: Asso_for_Info_Science_Tech_-_2024_-_Jahani_-_Measuring_the_retrievability_of_digital_library_content_using_analytics_data.pdf
Final Published Version
License: Creative Commons Attribution-NonCommercial 4.0 logo

Download (1MB)| Preview

Abstract

Digital libraries aim to provide value to users by housing content that is accessible and searchable. Often such access is afforded through external web search engines. In this article, we measure how easily digital library content can be retrieved (i.e., how retrievable) through a well-known search engine (Google) using its analytics platforms. Using two measures of document retrievability, we contrast our results with simulation-based studies that employed synthetic query sets. We determine that estimating the retrievability of content given a Digital Library index is not a strong predictor of how retrievable the content is in practice (via external search engines). Retrievability established the notion that search algorithms can be biased. In our work, we find that while there such bias is present, much of the variation in retrievability appears to be strongly influenced by the queries submitted to the library, a side of retrievability less examined in past work.