Measuring the utility of search engine result pages : an information foraging based measure

Azzopardi, Leif and Thomas, Paul and Craswell, Nick (2018) Measuring the utility of search engine result pages : an information foraging based measure. In: 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, 2018-07-08 - 2018-07-12. (https://doi.org/10.1145/3209978.3210027)

[thumbnail of Azzopardi-Thomas-Craswell-SIGIR-2018-Measuring-the-utlity-of-search-engine-result]
Preview
Text. Filename: Azzopardi_Thomas_Craswell_SIGIR_2018_Measuring_the_utlity_of_search_engine_result.pdf
Accepted Author Manuscript
License: Other

Download (1MB)| Preview

Abstract

Web Search Engine Result Pages (SERPs) are complex responses to queries, containing many heterogeneous result elements (web results, advertisements, and specialised “answers”) positioned in a variety of layouts. This poses numerous challenges when trying to measure the quality of a SERP because standard measures were designed for homogeneous ranked lists. In this paper, we aim to measure the utility and cost of SERPs. To ground this work we adopt the C/W/L framework which enables a direct comparison between different measures in the same units of measurement, i.e. expected (total) utility and cost. Within this framework, we propose a new measure based on information foraging theory, which can account for the heterogeneity of elements, through different costs, and which naturally motivates the development of a user stopping model that adapts behaviour depending on the rate of gain. This directly connects models of how people search with how we measure search, providing a number of new dimensions in which to investigate and evaluate user behaviour and performance. We perform an analysis over 1000 popular queries issued to a major search engine, and report the aggregate utility experienced by users over time. Then in an comparison against common measures, we show that the proposed foraging based measure provides a more accurate reflection of the utility and of observed behaviours (stopping rank and time spent).