Automated weighted outlier detection technique for multivariate data

Thennadil, Suresh N. and Dewar, Mark and Herdsman, Craig and Nordon, Alison and Becker, Edo (2018) Automated weighted outlier detection technique for multivariate data. Control Engineering Practice, 70. pp. 40-49. ISSN 0967-0661 (https://doi.org/10.1016/j.conengprac.2017.09.018)

[thumbnail of Thennadil-etal-CEP2017-Automated-weighted-outlier-detection-technique-for-multivariate-data]
Preview
Text. Filename: Thennadil_etal_CEP2017_Automated_weighted_outlier_detection_technique_for_multivariate_data.pdf
Accepted Author Manuscript
License: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 logo

Download (1MB)| Preview

Abstract

In the chemical and petrochemical industries, spectroscopy-based online analysers are becoming common for process monitoring and control applications. A significant challenge in using these analysers as part of process monitoring and control loops is the large amount of personnel time required for calibration and maintenance of models which involve decision inputs such as whether an observation is an outlier, the number of latent variables in a model, type of pre-processing and when a calibration model has to be updated. Since no one measure works well for all applications, supervision by the process data analyst is required which invariably involves some level of subjectivity. In this paper, we focus on the detection of multivariate outliers in a calibration set. We propose a method which combines multiple outlier detection techniques to identify a set of outlying observations without operator input. Apart from the overall methodology, this work introduces several novelties. The system uses partial least squares (PLS) instead of principal component analysis (PCA) which is normally used for detecting multivariate outliers. A simple modification to the Mahalanobis distance was also proposed which appears to be more sensitive to outliers than the conventional Mahalanobis distance. The methodology also introduces the concept of a desirability function to enable automatic decision making based on multiple statistical measures for outlier detection. The methodology is demonstrated using Raman spectroscopy data collected from an industrial distillation process.