The implications of non-parametric distributions : what does it mean when your data's not normal?

Ford, Steven and Miao, Gengyuan (2021) The implications of non-parametric distributions : what does it mean when your data's not normal? In: 2021 #RSCPoster Twitter Conference, 2021-03-02 - 2021-03-03, Virtual.

[thumbnail of Ford-Miao-2021-The-implications-of-non-parametric-distributions]
Preview
Text (Ford-Miao-2021-The-implications-of-non-parametric-distributions)
Ford_Miao_2021_The_implications_of_non_parametric_distributions.pdf
Final Published Version

Download (114kB)| Preview

    Abstract

    Introduction Many scientists use mean (µ) and standard deviation (σ) to describe the centre point and ‘range’ of a measured set of data. However, this is only strictly true when dealing with normally distributed data. How well does µ and σ represent other (non-normal) distributions, and how many datapoints are ‘outside’ the range defined by µ and σ? This is a key question for manufacturing (ie tablet production).Methods Several different types of distribution were modelled in Excel (see left figure). The number of datapoints outside the modelled range (the range is given by µ ± Aσ, where A = 0.1,0.2,…, 3.0) is compared to the expected number of datapoints outside the same range in a normal distribution, producing a ratio (right figure).105 and 106 datapoints are required to reduce error at A values > 1.5.Results These results suggest that considering µ ± 2σ is a better guide to underestimate the number datapoints outside that range than µ ± σ.