Benchmarking network-based gene prioritization methods for cerebral small vessel disease
Zhang, Huayu and Ferguson, Amy and Robertson, Grant and Jiang, Muchen and Zhang, Teng and Sudlow, Cathie and Smith, Keith and Rannikmae, Kristiina and Wu, Honghan (2021) Benchmarking network-based gene prioritization methods for cerebral small vessel disease. Briefings in Bioinformatics, 22 (5). bbab006. ISSN 1467-5463 (https://doi.org/10.1093/bib/bbab006)
Preview |
Text.
Filename: Zhang-etal-BB-2021-Benchmarking-network-based-gene-prioritization-methods-for-cerebral-small-vessel-disease.pdf
Final Published Version License: Download (1MB)| Preview |
Abstract
Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene–disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein–gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease–gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.
ORCID iDs
Zhang, Huayu, Ferguson, Amy, Robertson, Grant, Jiang, Muchen, Zhang, Teng, Sudlow, Cathie, Smith, Keith ORCID: https://orcid.org/0000-0002-4615-9020, Rannikmae, Kristiina and Wu, Honghan;-
-
Item type: Article ID code: 87339 Dates: DateEvent26 February 2021Published4 January 2021Accepted19 October 2020SubmittedSubjects: Science > Natural history > Genetics
Medicine > Internal medicine > Neuroscience. Biological psychiatry. Neuropsychiatry
Science > Mathematics > Electronic computers. Computer scienceDepartment: UNSPECIFIED Depositing user: Pure Administrator Date deposited: 15 Nov 2023 16:56 Last modified: 25 Sep 2024 12:22 URI: https://strathprints.strath.ac.uk/id/eprint/87339