Analyzing the influence of bigrams on retrieval bias and effectiveness
AlQatan, Abdulaziz and Azzopardi, Leif and Moshfeghi, Yashar; (2020) Analyzing the influence of bigrams on retrieval bias and effectiveness. In: ICTIR 2020 - Proceedings of the 2020 ACM SIGIR International Conference on Theory of Information Retrieval. ACM, NOR, 157–160. ISBN 9781450380676 (https://doi.org/10.1145/3409256.3409831)
Preview |
Text.
Filename: AlQatan_etal_ICTIR_2020_Analyzing_the_influence_of_bigrams_on_retrieval_bias.pdf
Accepted Author Manuscript Download (1MB)| Preview |
Abstract
Prior work on using retrievability measures in the evaluation of information retrieval (IR) systems has laid out the foundations for investigating the relationship between retrieval effectiveness and retrieval bias. While various factors influencing bias have been examined, there has been no work examining the impact of using bigram within the index on retrieval bias. Intuitively, how the documents are represented, and what terms they contain, will influence whether they are retrievable or not. In this paper, we investigate how the bias of a system changes depending on how the documents are represented using unigrams, bigrams or both. Our analysis of three different retrieval models on three TREC collections, shows that using a bigram only representation results in the lowest bias compared to unigram only representation, but at the expense of retrieval effectiveness. However, when both representations are combined it results in reducing the overall bias, as well as increasing effectiveness. These findings suggest that when configuring and indexing the collection, that the bag-of-words approach (unigrams), should be augmented with bigrams to create better and fairer retrieval systems.
ORCID iDs
AlQatan, Abdulaziz, Azzopardi, Leif and Moshfeghi, Yashar ORCID: https://orcid.org/0000-0003-4186-1088;-
-
Item type: Book Section ID code: 73422 Dates: DateEvent14 September 2020Published1 June 2020AcceptedSubjects: Bibliography. Library Science. Information Resources > Library Science. Information Science Department: Faculty of Science > Computer and Information Sciences Depositing user: Pure Administrator Date deposited: 03 Aug 2020 20:48 Last modified: 11 Nov 2024 15:22 Related URLs: URI: https://strathprints.strath.ac.uk/id/eprint/73422