Handling missing data in a rheumatoid arthritis registry using random forest approach

Tools

Alsaber, Ahmad and Al-Herz, Adeeba and Pan, Jiazhu and Al-Sultan, Ahmad T. and Mishra, Divya, KRRD Group (2021) Handling missing data in a rheumatoid arthritis registry using random forest approach. International Journal of Rheumatic Diseases, 24 (10). 1282–1293. ISSN 1756-185X (https://doi.org/10.1111/1756-185X.14203)

Preview	Text. Filename: Alsaber_etal_IJRD_2021_Handling_missing_data_in_rheumatoid_arthritis_registry_using_random_forest_approach.pdf Accepted Author Manuscript Download (806kB)\| Preview
Preview	Text. Filename: Int_J_of_Rheum_Dis_-_2021_-_Alsaber_-_Handling_missing_data_in_a_rheumatoid_arthritis_registry_using_random_forest_approach.pdf Final Published Version Download (968kB)\| Preview

Abstract

Missing data in clinical epidemiological research violate the intention-to-treat principle, reduce the power of statistical analysis, and can introduce bias if the cause of missing data is related to a patient's response to treatment. Multiple imputation provides a solution to predict the values of missing data. The main objective of this study is to estimate and impute missing values in patient records. The data from the Kuwait Registry for Rheumatic Diseases was used to deal with missing values among patient records. A number of methods were implemented to deal with missing data; however, choosing the best imputation method was judged by the lowest root mean square error (RMSE). Among 1735 rheumatoid arthritis patients, we found missing values vary from 5% to 65.5% of the total observations. The results show that sequential random forest method can estimate these missing values with a high level of accuracy. The RMSE varied between 2.5 and 5.0. missForest had the lowest imputation error for both continuous and categorical variables under each missing data rate (10%, 20%, and 30%) and had the smallest prediction error difference when the models used the imputed laboratory values.

ORCID iDs

Alsaber, Ahmad

, Al-Herz, Adeeba, Pan, Jiazhu

, Al-Sultan, Ahmad T. and Mishra, Divya;

Share and Export

Item metadata

Item type:	Article
ID code:	77294
Dates:	Date Event 1 October 2021 Published 12 August 2021 Published Online 23 July 2021 Accepted
Subjects:	Medicine
Department:	Faculty of Science > Mathematics and Statistics
Depositing user:	Pure Administrator
Date deposited:	04 Aug 2021 14:43
Last modified:	06 May 2025 01:02
URI:	https://strathprints.strath.ac.uk/id/eprint/77294

CORE (COnnecting REpositories)