The SWEEP Algorithm for Multiple Regression

Richard B. Darlington
Cornell University

The SWEEP procedure is a very efficient way to compute the central statistics used in multiple regression (multiple correlation, residual variance, regression slopes, and standard errors of slopes, plus some other values) from a square correlation or covariance matrix. It is particularly useful when you want to compute a large number of different regressions involving the same dependent variables but different sets of independent variables, as in stepwise regression or all-subsets regression. As explained in another article, I believe all-subsets regression can be useful in identifying patterns of collinearity, and I use SWEEP in a computer program I have written for that purpose.

SWEEP's best-known disadvantage is that if it is applied many times to the same matrix, rounding error can accumulate. However, this is not usually a serious problem, and furthermore there are ways to check for it.

SWEEPing any row j of a correlation or covariance matrix C changes C into a matrix we'll call G. SWEEPing proceeds as follows:

  1. Define a "pivot" value PV as pv=1/c[j,j].

  2. For entries not in row j or column j, compute g[h,i] = c[h,i] -c[h,j]*c[j,i]*pv

  3. For entries in row j but not column j, g[j,i] = c[j,i]*pv. In other words, multiply thse entries by PV.

  4. For entries in column j but not row j, g[j,i] = -c[j,i]*pv. In other words, multiply these entries by -PV.

  5. g[j,j] = pv

In a matrix-oriented computer language like GAUSS, it is most efficient to ignore the "not's" in these directions, since those entries will later be overwritten anyway. That is, after computing PV (step 1), apply step 2 to all entries in the matrix, since the entries for which this yields incorrect values will be overwritten by steps 3, 4, and 5. Similarly, apply step 3 to all entries in row j, and apply step 4 to all entries in column j, since the only entry for which this yields incorrect values -- entry g[j,j] -- will be overwritten in step 5. A GAUSS proc (procedure) for sweeping is shown below.

proc sweep(j,c);
local pv,g;
pv = 1/c[j,j]; @ step 1 @
g = c - c[.,j]*c[j,.]*pv; @ step 2 @
g[j,.] = c[j,.]*pv; @ step 3 @
g[.,j] = -c[.,j]*pv; @ step 4 @
g[j,j] = pv; @ step 5 @

SWEEP is usually applied several times, perhaps many times. In each sweep, after G is created from C, G is usually copied onto C, thus erasing C. Therefore you should really think of SWEEP as a procedure for modifying C. Since C is modified, before sweeping you typically want to save the original form of C under some other name such as C0. The particular GAUSS program just shown would typically be used in a command like "c = sweep(3,c)". That particular command sweeps the third row of C, and denotes the modified matrix as C.

Rows can be swept in any order, but to explain what sweeping does, we shall consider an example in which we sweep rows 1, 2, and 3 of the 5 x 5 covariance matrix below.

     9  3  4 -2  5
     3  8  6  5  4
C =  4  6  7  3  1
    -2  5  3  9  2
     5  4  1  2  8

Sweeping rows 1, 2, 3 of C transforms it into the following 5 x 5 matrix:

  0.1504   0.0226  -0.1053     -0.5038   0.7368
  0.0226   0.3534  -0.3158      0.7744   1.2105
 -0.1053  -0.3158   0.4737      0.0526  -1.3158

  0.5038  -0.7744  -0.0526      3.9624   1.3684
 -0.7368  -1.2105   1.3158      1.3684   0.7895

Spaces separate the swept rows, and corresponding columns, from the unswept ones.

To understand what sweeping does, imagine that we want to run regressions predicting variables 4 and 5 from variables 1, 2, and 3. More generally, usually we usually sweep the predictor or independent variables, and don't sweep the criterion or dependent variables. Sweeping does the following:

  1. The submatrix of swept rows and columns (the upper left 3 x 3 submatrix in this example) equals the inverse of the corresponding submatrix of the original C. This submatrix is always symmetric. This inverse matrix can be used to compute the standard errors and covariances of the regression coefficients, though it is beyond the scope of this article to describe exactly how.

  2. The submatrix of unswept rows and columns (the lower right 2 x 2 submatrix in this example) is the matrix of residual variances and covariances. In this example, in the regression predicting variable 4 from variables 1, 2, 3, the residuals have a variance of 3.9624, in comparison to the total variance of 9 visible in the original C matrix. In the regression predicting variable 5 from variables 1, 2, 3, the residuals have a variance of .7895, in comparison to a total variance of 8. And the covariance between the two sets of residuals is 1.3684. From these values one can easily compute R2 values; for variable 4 we have R2 = 1 - 3.9624/9 = .5597, while for variable 5 we have R2 = 1 - .7895/8 = .9013. All these variances and covariances are defined with N, not N-1, in the denominators. For instance, each variance equals (sum of squared residuals)/N.

  3. The submatrix corresponding to the swept rows and the unswept columns (the upper right 3 x 2 submatrix in this example) contains the regression coefficients for predicting the unswept variables from the swept ones. In this example, -.5038, .7744, and .0526 are the coefficients predicting variable 4 from variables 1, 2, 3, while .7368, 1.2105, and -1.3158 are the coefficients predicting variable 5.

  4. The remaining submatrix, corresponding to the unswept rows and the swept columns (the lower left 2 x 3 submatrix in this example) contains no new information; its transpose is simply -1 times the submatrix described in #3.

SWEEPing is an efficient way to compute a whole series of regressions, as in stepwise regression. Its principal disadvantage is that rounding error can accumulate over many sweeps. One way to check on this is to unsweep all the swept rows -- that is, resweep them, since sweeping is reversible -- to get back to a matrix that will differ from the original one only due to rounding error, and then see how much rounding error has accumulated. In fact, SWEEPing in modern computers is so fast that one can easily do the following:
  1. Starting with the original matrix C0, perform the desired sweeps and calculate the desired statistics.
  2. Unsweep the swept rows, returning to a matrix I'll call C1, which differs from C0 only due to accumulated rounding error.
  3. Perform all the sweeps again, starting this time from C1. Compare all the desired statistics computed in this step to their values computed in step 1.

Presumably the difference between the values found in steps 1 and 3 will exceed the amount of rounding error in the first computed values, because more sweeps separated the two computed values than were performed in computing the first set of computed values. Thus we presumably have an upper limit on the rounding error in those values.