Identifying psychiatric diagnosis from missing mood data through the use of log-signature features

Wu, Yue and Goodwin, Guy M. and Lyons, Terry and Saunders, Kate E. A. (2022) Identifying psychiatric diagnosis from missing mood data through the use of log-signature features. PLOS One, 17 (11). e0276821. ISSN 1932-6203 (https://doi.org/10.1371/journal.pone.0276821)

[thumbnail of Wu-etal-PLOSOne-2022-Identifying-psychiatric-diagnosis-from-missing-mood-data-through-the-use-of-log-signature-features]
Preview
Text. Filename: Wu_etal_PLOSOne_2022_Identifying_psychiatric_diagnosis_from_missing_mood_data_through_the_use_of_log_signature_features.pdf
Final Published Version
License: Creative Commons Attribution 4.0 logo

Download (1MB)| Preview

Abstract

The availability of mobile technologies has enabled the efficient collection of prospective longitudinal, ecologically valid self-reported clinical questionnaires from people with psychiatric diagnoses. These data streams have potential for improving the efficiency and accuracy of psychiatric diagnosis as well predicting future mood states enabling earlier intervention. However, missing responses are common in such datasets and there is little consensus as to how these should be dealt with in practice. In this study, the missing-response-incorporated log-signature method achieves roughly 74.8% correct diagnosis, with f1 scores for three diagnostic groups 66% (bipolar disorder), 83% (healthy control) and 75% (borderline personality disorder) respectively. This was superior to the naive model which excluded missing data and advanced models which implemented different imputation approaches, namely, k-nearest neighbours (KNN), probabilistic principal components analysis (PPCA) and random forest-based multiple imputation by chained equations (rfMICE). The log-signature method provided an effective approach to the analysis of prospectively collected mood data where missing data was common and should be considered as an approach in other similar datasets. Because of treating missing responses as a signal, its superiority also highlights that missing data conveys valuable clinical information.