Investigation of metabolomics techniques by analysis of MS propolis data : which pre-treatment method is better?

Alghamdi, Abdulaziz and Gray, Alison and Watson, David (2019) Investigation of metabolomics techniques by analysis of MS propolis data : which pre-treatment method is better? Advances and Applications in Statistics, 58 (1). pp. 13-34. ISSN 0972-3617

[img]
Preview
Text (Alghamdi-etal-AAS2019-Investigation-of-metabolomics-techniques-by-analysis-of-MS-propolis)
Alghamdi_etal_AAS2019_Investigation_of_metabolomics_techniques_by_analysis_of_MS_propolis.pdf
Accepted Author Manuscript

Download (450kB)| Preview

    Abstract

    Metabolomics data usually undergoes both pre-processing of the raw data and then further pre-treatment before any statistical analysis is carried out. Different pre-treatment methods emphasise various aspects of the data, and each method has advantages and disadvantages. The choice of pre-treatment method depends on the biological question of interest, characteristics of the data and the chosen data analysis. In this paper, we investigate the effects of different pre-treatment methods on four metabolomics data sets arising from chemical analysis of propolis samples collected from honey bee colonies in three different locations in Scotland, and also samples from Libya. Propolis has a variety of biological properties including anti-protozoal and anti-inflammatory effects. As a complex mixture, its biological activity depends on its exact composition, which can be investigated via metabolomic analysis. Two techniques of pre-treatment were applied, namely, transformation and scaling. The choice of method was found to greatly affect the results of the principal component analysis (PCA) used to explain the variation in the data. The results indicated that there was no notable (if any) improvement to be made by using any transformation techniques. It was also found for all four data sets that Pareto scaling, incorporating mean centring, performed better than the other scaling approaches considered here in terms of PCA, the analysis of interest, because the results explain more of the variation in the data.