Many statistical tests rely upon certain properties of the data.
One common property, upon which many linear tests depend, is that of
normality — the data must have been drawn from a normal distribution.
It is necessary then to ensure normality before deciding upon the
test procedure to use. One way to do this uses the EXAMINE
command.
In the following example, a researcher was examining the failure rates of equipment produced by an engineering company. The file repairs.sav contains the mean time between failures (mtbf) of some items of equipment subject to the study. Before performing linear analysis on the data, the researcher wanted to ascertain that the data is normally distributed.
PSPP> get file='//share/pspp/examples/repairs.sav'. PSPP> examine mtbf /statistics=descriptives.
This produces the following output:
|
A normal distribution has a skewness and kurtosis of zero. The skewness of mtbf in the output above makes it clear that the mtbf figures have a lot of positive skew and are therefore not drawn from a normally distributed variable. Positive skew can often be compensated for by applying a logarithmic transformation, as in the following continuation of the example:
PSPP> compute mtbf_ln = ln (mtbf). PSPP> examine mtbf_ln /statistics=descriptives.
which produces the following additional output:
|
The COMPUTE
command in the first line above performs the
logarithmic transformation:
compute mtbf_ln = ln (mtbf).
Rather than redefining the existing variable, this use of COMPUTE
defines a new variable mtbf_ln which is
the natural logarithm of mtbf.
The final command in this example calls EXAMINE
on this new variable.
The results show that both the skewness and
kurtosis for mtbf_ln are very close to zero.
This provides some confidence that the mtbf_ln variable is
normally distributed and thus safe for linear analysis.
In the event that no suitable transformation can be found,
then it would be worth considering
an appropriate non-parametric test instead of a linear one.
See NPAR TESTS, for information about non-parametric tests.