Supplementing Random Assignment with Statistical Control

by Richard B. Darlington
Copyright © Richard B. Darlington. All rights reserved.

Many scientists think of random assignment and statistical control (the use of covariates in linear models) as alternative methods of control. It is well known that random assignment has certain advantages over statistical control; see chapter 4 of my book Regression and Linear Models (hereafter abbreviated RLM). However, there are are at least four reasons for using statistical control along with random assignment if the latter is planned. This note outlines those reasons. The first three reasons (control of nonrandom attrition, assessment of indirect effects, and increased power and precision in estimating effect sizes) are well understood by many people. However, I believe that the last of the four reasons has not been described before.

Nonrandom Attrition

The first of the four need be mentioned here only briefly: the existence of nonrandom attrition even in the presence of pure random assignment. Nonrandom attrition can destroy the equivalence of groups so carefully created by random assignment, and statistical control can help ameliorate that problem.

Assessing Indirect Effects

The second reason is that even if random assignment establishes that an effect exists, linear models can help show how the effect works. This can be done with a regression predicting the dependent variable from the independent variable of treatment condition, plus measures of the various mechanisms by which the treatment may work. A significant regression weight for any of these variables suggests that it is at least one of the operative mechanisms.

For instance, consider an experiment in which a randomly-assigned half of all subjects are told a mental test indicates they should be especially good at solving problems of a certain type, and are then found to persist longer in trying to solve such problems which are in fact impossible. Of course, subjects are told the truth at the end of the experiment. Was their persistence produced (a) by self-confidence, or (b) by increased liking of the experimenter who had complimented them, or (c) by some other intervening mechanism, or (d) by some combination of these? These various possibilities can be distinguished by a regression predicting perseverance from the independent variable of treatment condition, plus measures of self-confidence and liking of the experimenter, which are the proposed mechanisms in this example. Under choice A we expect only self-confidence to be significant, under B we expect only liking to be significant, under C we expect only treatment condition to be signficant, while D covers all possibilities of two or three significant effects. The use of regression in such cases clarifies not so much the presence of the effect as its nature--that is, the intervening variables that mediate the effect. These are actually measures of indirect effects, which are discussed more fully in RLM Section 7.2.

Increased Precision and Power

The third reason for supplementing random assignment with linear models follows more directly than the first two from basic principles of statistics and logic. This third reason is a potentially large gain in the precision with which the treatment effect size is estimated and the power of hypothesis tests on the effect. This advantage increases with the correlations between covariates and the dependent variable within treatment groups. Therefore, to illustrate the point clearly and simply even without formulas for statistical inference in regression (see RLM Chapter 5), we will use an example in which these correlations are very high.
The figure shows a case in which 7 people in a treatment group (the 7 triangles) and 7 others in a control group (the 7 circles) are measured both before and after a treatment. For simplicity I have made the mean pretest scores exactly equal for treatment and control samples, though the major points below are valid without this condition.

The diagonal lines in the figure represent the model fitted by regressing posttest onto pretest and a dummy treatment variable. The upper line shows the predicted posttest scores of the treatment group, while the lower line shows the predicted scores of the control group. Assuming random sampling from a population, you can tell without any real calculation that the differences between treatment and control groups cannot be explained by chance, because every one of the treatment-group cases is closer to the treatment-group line than to the control-group line, while every one of the control-group cases is closer to the control-group line. The inferential formulas of RLM Chapter 5 agree with this intuitive conclusion; when they are used to test the hypothesis of no treatment effect, that hypothesis is rejected at the .0000045 level of significance (t = 8.33, df = 11).

We can also estimate the size of the treatment effect. The vertical distance between the two diagonal lines is 3.0; thus 3.0 is the estimated effect of the treatment on posttest scores, as explained in RLM Section 3.2.3. Again you can see without calculation that the lines in the figure must closely approximate the population lines, because randomly discarding any one case from the sample would hardly change at all the placement of either line. Therefore the vertical distance of 3.0 between the lines must be an accurate estimate of the treatment effect. Again the formulas in RLM Chapter 5 agree with our intuition; they show a standard error of only .360 for the estimated treatment effect.

But when we ignore regression formulas, and use a simple two-sample ttest to test the significance of the difference between the two groups, we ignore the information about each case's horizontal placement in Figure 4.1, using only its vertical placement. If this information were all we had, in our noncomputational intuitive testing we wouldn't be nearly so certain about the size or even the existence of the treatment effect, since the treatment-group posttest scores range from 6 to 15, and the control-group scores overlap them substantially, ranging from 2 to 12. The two-sample ttest has this same limitation. Because treatment and control groups had exactly the same means on pretest, the estimated difference between groups is the same whether or not we control for pretest. The previously estimated effect size of 3.0 was simply the difference between the two sample means, which forms the numerator of the ttest. But because the ttest ignores pretest scores, that estimate's standard error is 1.79--nearly five times the value of .360 mentioned above. Thus we find a nonsignificant tof 3/1.79 = 1.68, df = 12, p = .12.

In this example the treatment effect was not significant at even the .05 level without statistical control, but much the same point can apply even if it is. If you had to choose between spending thousands of dollars on a treatment that had been demonstrated effective at just the .05 level, or the same money on another treatment whose effectiveness had been demonstrated at the .001 level, which would you choose? Presumably the latter; after all, 50 times as many ineffective treatments pass tests at the .05 level as at the .001 level. Thus investigators should attempt to show the most significant results they validly can.

I do not mean to imply that one always gains power by indiscriminately adding covariates to a model with random assignment. The more strongly a covariate affects the dependent variable Y, the more power is gained from controlling it. But if a covariate has absolutely no effect on Y, one actually loses a little power by adding it to the model. The power lost is the same as is lost by randomly discarding one case from the sample, so the loss is usually small. But even this small loss suggests that one should not indiscriminately add dozens of extra covariates to the model, just because they happen to be in the data set. Elsewhere I describe and justify a method for selecting a specific set of covariates. In the method's simplest form, you predict the dependent variable from a broad set of relevant covariates, then drop from the model the covariate with the lowest absolute t. As described there, continue dropping covariates one at a time, recomputing the regression after each deletion, until all remaining covariates have absolute t's of 1.42 or higher. Add the independent variable to the regression only after completing this process. Otherwise the covariates correlating highest with the treatment variable--the very ones it is most important to keep--will tend to be deleted because of their redundancy with the treatment variable.

Invulnerability to Chance Differences Between Treatment Groups

The fourth reason for supplementing random assignment with statistical control is that statistical control increases a conclusion's invulnerability to criticisms based on chance differences between treatment groups. For an extreme example, consider a design with pretest and posttest measures, random assignment to treatment and control groups, and no real-world complications such as attrition. Suppose we find a highly significant posttest difference between the two groups, using a test whose validity is not challenged on grounds of normal distributions or similar considerations. But we happen to notice that every person's score on the posttest exactly equals their score on the pretest, so that the difference between groups is just as significant on pretest as on posttest! The significant difference on posttest no longer gives us any confidence at all that the experimental effect actually exists. Rather, the hypothesis most consistent with the data is that the experimental treatment has no effect at all, that nobody's score on Y changes from pretest to posttest, and that we simply happened to draw a sample that had an exceptionally large between-group difference on pretest.

Once we agree that in this extreme example there is some doubt about the treatment's effectiveness, we must ask how extreme an example must be to raise similar doubts. Perhaps we should be concerned about all significant differences between treatment groups on covariates, despite the familiar argument (given in RLM Section 4.1.2) against this position. But we can avoid the whole problem by using linear models along with random assignment. The problem arises because we presume that the covariates correlate with the dependent variable in the population, so that if by chance we draw a sample in which the covariates correlate with the treatment variable as well, then we must presume the sample correlation between the treatment and the dependent variable is at least partly spurious. But as described in RLM Chapter 2, in a linear model with an independent variable X and several covariates, X's sample regression slope can be thought of as the simple regression slope predicting the dependent variable from the portion of X independent of the covariates. This portion of X is exactly uncorrelated with all covariates in the sample studied, not merely in some hypothetical population. This eliminates the problem, which was that X might conceiveably correlate highly with covariates just by chance in the sample studied, even though random assignment assures that this correlation is zero in the population. But the use of a linear model means that we are always using in effect just the portion of X that is independent of the covariates in the sample studied. Even in our extreme example, regression would estimate the treatment effect to be zero, which is the estimate supported by intuition.