Count regression and machine learning approach for zero-inflated over-dispersed count data. Application to micro-retail distribution and urban form

Araldi, Alessandro and Venerandi, Alessandro and Fusco, Giovanni; Gervasi, Osvaldo and Murgante, Beniamino and Misra, Sanjay and Garau, Chiara and Blecic, Ivan and Taniar, David and Apduhan, Bernady O. and Rocha, Ana Maria A.C. and Tarantino, Eufemia and Torre, Carmelo Maria and Karaca, Yeliz, eds. (2020) Count regression and machine learning approach for zero-inflated over-dispersed count data. Application to micro-retail distribution and urban form. In: Computational Science and Its Applications – ICCSA 2020. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . Springer Science and Business Media Deutschland GmbH, ITA, pp. 550-565. ISBN 9783030588106 (https://doi.org/10.1007/978-3-030-58811-3_40)

[thumbnail of Araldi-etal-ICCSA-2020-Count-regression-and-machine-learning-approach]
Preview
Text. Filename: Araldi_etal_ICCSA_2020_Count_regression_and_machine_learning_approach.pdf
Accepted Author Manuscript

Download (672kB)| Preview

Abstract

This paper investigates the relationship between urban form and the spatial distribution of micro-retail activities. In the last decades, several works demonstrated how configurational properties of the street network and morphological descriptors of the urban built environment are significantly related to store distribution. However, two main challenges still need to be addressed. On the one side, the combined effect of different urban form properties should be considered providing a holistic study of the urban form and its relationship to retail patterns. On the other, analytical approaches should consider the discrete, skewed and zero-inflated nature of the micro-retail distribution. To overcome these limitations, this work compares two sophisticated modelling procedure: Penalised Count Regression and Machine Learning approaches. While the former is specifically conceived to account for retail count distribution, the latter can capture non-linear behaviours in the data. The two modelling procedures are implemented on the same large dataset of street-based measures describing the urban form of the French Riviera. The outcomes of the two modelling approaches are compared in terms of prediction performance and selection frequencies of the most recurrent variables among the implemented models.