Part 5: Outliers

In statistics, an outlier is an observation that is numerically distant from the rest of the data. Grubbs defined an outlier as: „An outlying observation, or outlier, is one that appears to deviate markedly from other members of the sample in which it occurs.“

Outliers can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution. In the former case one wishes to discard them or use statistics that are robust to outliers, while in the latter case they indicate that the distribution has high kurtosis and that one should be very cautious in using tools or intuitions that assume a normal distribution.

In a sense, this definition leaves it up to the researcher (or a consensus process) to decide what will be considered abnormal. Before abnormal observations can be singled out, it is necessary to characterize normal observations. MaxStat offers the Grubbs test for detecting outliers when the data are normally distributed. Alternatively, data can be visually inspected using scatter or box plots.

Outlier points can indicate faulty data, erroneous procedures, or areas where a certain theory might not be valid. However, in large samples, a small number of outliers is to be expected (and not due to any anomalous condition). Outliers, being the most extreme observations, may include the sample maximum or minimum, or both, depending on whether they are extremely high or low. However, the sample maximum and minimum are not always outliers because they may not be unusually far from other observations. Therefore, handling of outliers can be difficult, and their removal from the whole data sets should be done with all considerations. In all cases, detection and removal of outliers should be reported in any type of scientific report and publication.5


