Richard B. Darlington

Cornell University

ALLSETS is a procedure that can scan all these regressions with
surprising speed, computing *R*^{2} for each.
I argue elsewhere that ALLSETS is useful in
identifying collinear sets of variables in a regression.

ALLSETS uses a procedure called SWEEP
which can be applied to a single variable in a
correlation matrix -- that is, to a single row and matching column in the
matrix. By SWEEPing any set of variables in the matrix, one can easily
compute the *R*^{2} for predicting any unswept variable from the
swept variables. For instance, if a matrix had 5 variables and you swept
variables 1, 3, and 5, then the second diagonal entry in the swept matrix
would be 1 - *R*^{2}, where *R*^{2} is the
squared multiple correlation predicting the second variable from variables 1, 3,
and 5. The fourth diagonal entry would have a similar meaning for variable
4.

Each sweep changes every entry in the matrix. Sweeping is reversible -- that is, sweeping a variable twice returns the matrix to its original state. Thus to say that a variable "is" currently swept is to say that it has been swept an odd number of times, and to say it "is not" currently swept is to say it has been swept an even number of times.

ALLSETS works with the correlation matrix of *P* predictor variables
plus *Y*, with *Y* placed last. By performing
2^{P} - 1 sweeps in
that matrix, ALLSETS computes all 2^{P} - 1 nonnull values of
*R*^{2} for predicting *Y*. (We don't need to calclulate
*R*^{2} for the "null" regression with no predictors; we know it's 0.)
Then performing one more sweep returns the correlation matrix to its original state,
except for any accumulated rounding error.

Because of that last point, the last sweep is recommended, since it
allows the user to compare the final matrix with the original one, to see how
much rounding error has accumulated. The less has accumulated, the more
confidence we can have in the *R*^{2} values computed along the
way. Rounding error is discussed in more detail below.

Let *d*(*Y*) denote the last diagonal entry in the matrix -- the diagonal
entry for *Y*. Then after any number of sweeps, *R*^{2} for the set
of swept variables is *R*^{2} = 1 - *d*(*Y*).

The necessary sequence of sweeps is easily described. For *P* = 4 it
is

1 2 1 3 1 2 1 4 1 2 1 3 1 2 1 4

That is, sweep variable 1, then 2, then 1 again, etc.

To construct the list of sweeps for any value of *P*, proceed as
follows:

- Start with a list of 2
^{P}values of 1. - Add 1 to every second entry, then another 1 to every 4th entry, then
another 1 to every 8th entry, etc.
- Stop after (
*P*-1) such additions, or equivalently when the last entry in the list equals*P*.

Variable Variables last currently swept swept 1 1 2 1, 2 1 2 3 2, 3 1 1, 2, 3 2 1, 3 1 3 3 -

As already mentioned, one way to check on the accumulation of
rounding error is to compare the matrix after 2^{P} sweeps with
the original matrix, computing for instance the root mean square change in
the entries. Another way is perhaps more satisfying. This way does take more
computer time, though that is rarely a major problem, since the whole
procedure is very fast. Starting with the final matrix (the matrix after all
2^{P} sweeps), go through the whole process again, performing
2^{P} more sweeps, and compare the values of
R^{2} computed the second time around to the values computed
the first time around. The difference between these two sets of values should
presumably exceed the amount of rounding error in the first set of values,
because more sweeps occurred between those two values than were used to
calculate most of the original values.