White box : on the prediction of collaborative filtering recommendation systems' performance

Paun, Iulia and Moshfeghi, Yashar and Ntarmos, Nikos (2023) White box : on the prediction of collaborative filtering recommendation systems' performance. ACM Transactions on Internet Technology, 23 (1). pp. 1-29. 3554979. ISSN 1533-5399 (https://doi.org/10.1145/3554979)

[thumbnail of Paun-etal-ACTIT-2022-White-box-on-the-prediction-of-collaborative-filtering-recommendation]
Preview
Text. Filename: Paun_etal_ACTIT_2022_White_box_on_the_prediction_of_collaborative_filtering_recommendation.pdf
Accepted Author Manuscript
License: Strathprints license 1.0

Download (2MB)| Preview

Abstract

Collaborative Filtering (CF) recommendation algorithms are a popular solution to the information overload problem, aiding users in the item selection process. Relevant research has long focused on refining and improving these models to produce better (more effective) recommendations, and has converged on a methodology to predict their effectiveness on target datasets by evaluating them on random samples of the latter. However, predicting the efficiency of the solutions - especially with regard to their time- and resource-hungry training phase, whose requirements dwarf those of the prediction/recommendation phase - has received little to no attention in the literature. This article addresses this gap for a number of representative and highly popular CF models, including algorithms based on matrix factorization, k-nearest neighbors, co-clustering, and slope one schemes. To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time and memory usage of their training phase. Our contributions further include an adaptive sampling strategy, to address the tradeoff between resource usage costs and prediction accuracy, and a framework that quantifies both the efficiency and effectiveness of CF. Finally, a systematic experimental evaluation demonstrates that our method outperforms state-of-the-art regression schemes by a considerable margin, with an overhead that is a small fraction of the overall requirements of CF training.