Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Distribution Analyses

Trimmed and Winsorized Means

When outliers are present in the data, trimmed and Winsorized means are robust estimators of the population mean that are relatively insensitive to the outlying values. Therefore, trimming and Winsorization are methods for reducing the effects of extreme values in the sample. The k-times trimmed mean is calculated as
{\overline y}_{tk} = \frac{1}{n-2k}
 \sum_{i=k+1}^{n-k}{y_{(i)}}

The trimmed mean is computed after the k smallest and k largest observations are deleted from the sample. In other words, the observations are trimmed at each end. The k-times Winsorized mean is calculated as
{\overline y}_{wk} = \frac{1}n
 \{ (k+1) y_{(k+1)}
 + \sum_{i=k+2}^{n-k-1}{y_{(i)}}
 + (k+1) y_{(n-k)} \}

The Winsorized mean is computed after the k smallest observations are replaced by the (k+1)st smallest observation, and the k largest observations are replaced by the (k+1)st largest observation. In other words, the observations are Winsorized at each end. For a symmetric distribution, the symmetrically trimmed or Winsorized mean is an unbiased estimate of the population mean. But the trimmed or Winsorized mean does not have a normal distribution even if the data are from a normal population.

The Winsorized sum of squared deviations is defined as

s^2_{wk} =
 (k+1) ( y_{(k+1)} - {\overline y}_{wk})^2
 + \sum_{i=k+2}^{n-k-1}{( y_{(i)} - {\overline y}_{wk})^2 } +
 (k+1) ( y_{(n-k)} - {\overline y}_{wk})^2
A robust estimate of the variance of the trimmed mean { \overline y}_{tk}can be based on the Winsorized sum of squared deviations (Tukey and McLaughlin 1963). The resulting trimmed t test is given by
t_{tk} = \frac{{\overline y}_{tk}}{\rm{STDERR} ( {\overline y}_{tk}) }

where {\rm{STDERR}( {\overline y}_{tk})} is the standard error of { \overline y}_{tk}:
\rm{STDERR} ( {\overline y}_{tk}) =
 \frac{s_{wk}}{\sqrt{(n-2k)(n-2k-1)}}

A Winsorized t test is given by
t_{wk} = \frac{{\overline y}_{wk}}{\rm{STDERR} ( {\overline y}_{wk}) }

where {\rm{STDERR}( {\overline y}_{wk})} is the standard error of { {\overline y}_{wk}}:
\rm{STDERR} ( {\overline y}_{wk}) =
 \frac{n-1}{n-2k-1}
 \frac{s_{wk}}{\sqrt{n(n-1)}}
When the data are from a symmetric distribution, the distribution of the trimmed t statistic ttk or the Winsorized t statistic twk can be approximated by a Student's t distribution with n-2k-1 degrees of freedom (Tukey and McLaughlin 1963, Dixon and Tukey 1968). You can specify the number or percentage of observations to be trimmed or Winsorized from each end either by using the Trimmed/Winsorized Means options dialog or by using the Trimmed/Winsorized Means dialog after choosing Tables:Trimmed/Winsorized Mean:(1/2)N or Tables:Trimmed/Winsorized Mean:(1/2)Percent from the menus.

dist15.gif (3833 bytes)

Figure 38.15: (1/2)N Menu

dist16.gif (3997 bytes)

Figure 38.16: (1/2)Percent Menu

If you specify a percentage, 100 p%, 0<p<1, the smallest integer greater than or equal to np is trimmed or Winsorized from each end.

The Trimmed Mean and Winsorized Mean tables, as shown in Figure 38.17, contain the following statistics:

dist17.gif (12959 bytes)

Figure 38.17: Trimmed Means and Winsorized Means Tables

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.