REPTOC--An Adaptive Alternative to the Wilcoxon Signed-Ranks Test

Richard B. Darlington

Abstract

REPTOC stands for "rapid exhaustive permutation test on a center." It tests the same null hypothesis as the Wilcoxon signed-ranks test--the hypothesis that scores on a single variable are distributed symmetrically around a specified center. Rejection of this null hypothesis allows more specific conclusions than is often realized. REPTOC requires only a miniscule fraction of the computer time that similar exhaustive permutation tests would require--seconds rather than years.

The Importance of Tests on the Wilcoxon Hypothesis

Many scientists fail to understand the usefulness of the Wilcoxon signed-ranks test and other tests of the same null hypothesis. Most textbooks describe this hypothesis as the hypothesis that scores on a variable are distributed symmetrically around a specified central value M0. Many scientists then ask,

"But what can I conclude if the null is rejected? It might mean that scores are distributed symmetrically around some other value, or it might mean that the distribution is asymmetric, possibly with a mean or median (or both) at M0. Therefore I can't really reach any useful conclusion."

This concern can be addressed through the concept of partial midranges. A population's midrange is the mean of the highest and lowest scores in the population. Define the Kth partial midrange as the average of the distribution's K percentile point and its 100-K percentile point. For instance, the fifth partial midrange is the average of the 5th and 95th percentile points, and the 50th partial midrange is the median. Even with no assumption of symmetry, a positive rejection of the Wilcoxon hypothesis (that is, a rejection with the rank-sum of positive deviations from M0 greater than the rank-sum of negative deviations) shows that at least some of the distribution's partial midranges must exceed M0. Thus there is at least some sense in which the distribution is centered above M0. Therefore rejection of the Wilcoxon hypothesis can lead to a useful and interesting conclusion. The general nature of this argument was suggested by Pratt and Gibbons (Concepts of Nonparametric Theory, Springer-Verlag, 1981, p. 155), though the term partial midrange is my own.

An even stronger conclusion can be reached. Once you have used a Wilcoxon or other test to show that a single partial midrange is, say, above M0, often the most parsimonsious conclusion (though certainly not the most conservative conclusion) is that all partial midranges fall on the same side of M0. That would imply that the mean, median, and all reasonable measures of central tendency are on that side of M0. Therefore a rejection of the null by the Wilcoxon or other test changes the most parsimonious hypothesis from the null to being the hypothesis that all reasonable measures of central tendency are on the same side of M0 as was indicated by the test.

Four Familiar Tests on a Distribution's Center

Besides the Wilcoxon test itself, the best-known tests for testing hypotheses about the center of a distribution are the sign test (which tests the median), the one-sample t test (which tests a mean), and the normal-scores test (which tests the Wilcoxon hypothesis). But the t test may be inaccurate if the sample size is small and the distribution nonnormal. The Wilcoxon test, the normal-scores test, and the sign test are all powerful tests for certain distributional shapes (uniform, normal, and dichotomous respectively), but all three tests lose power if the distribution differs from its optimum shape.

The Wilcoxon, normal-scores, and sign tests also lose power by losing information. For instance, the Wilcoxon test ranks data points by their absolute distances from M0. If three successive absolute distances were 14, 15, and 20, the test would simply assign adjacent ranks to those points, ignoring the fact that 15 is much closer to 14 than to 20. This loss of information generally produces a loss of statistical power.

The Wilcoxon and normal-scores tests also don't deal well with ties, which are very common in a wide range of problems. For instance, if 20 people respond to a 7-point attitude scale, the number of ties is necessarily large.

Permutation Tests and REPTOC

A permutation test can test the Wilcoxon hypothesis while largely or completely avoiding the problems of ties, rounding of scores, and inappropriate distribution shapes, since they adapt their results to the particular ties, scores, and shapes found in the sample. Permutation tests are either exhaustive or nonexhaustive. Exhaustive permutation tests are often impractical even with very fast computers, while nonexhaustive tests are only approximate, and also have the unpleasant property that two different investigators will typically find different significance levels from the very same data. As its name implies, REPTOC is a permutation test which is rapid despite being exhaustive. To introduce it, I first give a quick review of permutation tests for the Wilcoxon hypothesis.

The variable may be either absolutely continuous, allowing scores like 7.3265, or may be discrete such as number of offspring or score on a 7-point attitude scale. In a permutation test of the Wilcoxon hypothesis, one can ignore any data points that fall exactly at M0, so I shall define N as the number of other points. For example, suppose 5 data points are 14, 20, 24, 29, and 40, and you want to test the null hypothesis M0 = 20. Ignoring the value of 20, we compute the differences from 20 of the other data points, finding differences of -6, +4, +9, and +20, which sum to 27. We then examine all 2N sums that could be computed by independently changing the sign of each of these differences. For instance, making all 4 signs positive yields a sum of 33, while making just the first two positive yields a sum of -19, and so on. Then the proportion of these sums that are equal to or greater than the observed sum of 27, can be interpreted as a one-tailed p giving the probability that one would have found such a high sum just by chance, given the set of absolute distances from M0.

The exhaustive permutation test considers 2N different sums. Since this number doubles for every one-unit increase in N, it may be impractically large even for moderate N. For instance, if N = 50 then 2N exceeds 1015; based on trials with my own 486 computer, I estimate it would take that computer over 10,000 years to compute all these possible sums. Even a million-fold increase in computer speed (which is not expected in the foreseeable future) would reduce this to about 4 days, which is still too long for routine use. And a 20-unit increase in N would still increase the required time by a million fold.

A nonexhaustive permutation test samples randomly from the population of possibilities, but this can still take substantial computer time, and also has the obvious disadvantage that two analysts will find different p's even when working with the very same data.

REPTOC is a very minor modification of the exhaustive permutation test that reduces computing time manyfold. For instance, a problem like the one just described took 3 seconds rather than 10,000 years. REPTOC's principal computing requirement is that all absolute distances from M0 be rounded to positive integers. If desired, all the distances may be multiplied by a large positive constant to minimize the importance of the rounding. REPTOC's computing requirements are discussed later in more detail, but they are nearly always modest.

The REPTOC Algorithm

I have already explained how a significance level is computed in an exhaustive permutation test, by examining all 2N possible sums obtained by changing the signs of the deviations from M0, and counting the number of sums that exceed or equal the observed sum. For instance, if 4 absolute distances from M0 are 2, 2, 3, 5, then changing the signs of these distances produces 24 or 16 sums, of which three are

2-2+3-5 = -2
2+2-3+5 = 6
2+2+3+5=12

But any sum like these is easily calculated from the sum of the positive entries. Let Sum denote the algebraic sum like those illustrated, and let Sumpos, Sumneg, and Sumabs denote respectively the sum of positive values, sum of negative values, and sum of absolute values of the N distances. Then

Sum = Sumpos + Sumneg

Sumabs = Sumpos - Sumneg

The last equation can be rearranged to read

Sumneg = Sumpos - Sumabs

Using the right side of the last equation to replace Sumneg in the expression for "Sum" gives

Sum = 2 x Sumpos - Sumabs

Changing the signs of the distances from M0 doesn't change Sumabs, so it is constant for all the rows. Therefore the last equation shows that ranking the rows by their sums (or equivalently by their means) is equivalent to ranking them by their Sumpos values--the sums of their positive entries. But it turns out that there is a simple algorithm for finding the number of rows with each possible Sumpos value. I illustrate with the same problem, in which N = 4 and the absolute deviations from M0 are 2, 2, 3, 5. More generally, let D1, D2, D3,... represent the entries in such a row. The D's need not be ordered, though computing time is least if they are ordered from low to high.

  1. Create a "row" V0 (that's a zero) containing the single entry 1.
  2. Create a row V0S, where the S denotes "shifted". Row V0S is row V0, with D1 zeros attached to the left end. In the current example, D1 = 2 so V0S = 0 0 1.
  3. Add rows V0 and V0S; call the new row V1. In the example, V1 = 1 0 1.
  4. Create a new row V1S by adding D2 zeros to the left end of row V1. In the example, V1S = 0 0 1 0 1.
  5. Add V1 to V1S to create row V2. In the example, V2 = 1 0 2 0 1.
  6. Keep repeating this process until all D's are used and the final row number is numbered N.
For this example the entire process, starting with step 1, is illustrated below.

V0   1
V0S  0 0 1
V1   1 0 1
V1S  0 0 1 0 1
V2   1 0 2 0 1
V2S  0 0 0 1 0 2 0 1
V3   1 0 2 1 1 2 0 1
V3S  0 0 0 0 0 1 0 2 1 1 2 0 1
V4   1 0 2 1 1 3 0 3 1 1 2 0 1
The final row always has DS+1 entries, where DS is the sum of the N absolute distances. In the current example DS = 2+2+3+5 = 12, and V4 has 13 entries. Label these entries 0 through DS, as shown below, where Numways is the number of ways of finding a particular value of Sumpos.

Sumpos    0 1 2 3 4 5 6 7 8 9 10 11 12
Numways   1 0 2 1 1 3 0 3 1 1  2  0  1

This table shows the number of ways of finding each possible sum of positive entries. For instance the table shows there are two ways of assigning signs to the numbers 2, 2, 3, 5 so that the sum of positive entries is 10; they are -2 +2 +3 +5 and +2 -2 +3 +5. There are 3 ways of making the positive entries sum to 7; they are

+2 +2 +3 -5
+2 -2 -3 +5
-2 +2 -3 +5
The logic behind this method is essentially the same as that used in a computing algorithm for the Wilcoxon test, which is explained elsewhere.

Like the other unshifted rows (V0, V1, V2, etc.), the final row is always symmetrical. Identify the observed sum of positive entries; suppose here it's 2. Then sum the values in the "Number of ways" row, from that entry out to the nearer end. In this example we have 2+0+1=3. That's the total number of ways one could have found the observed number of positive entries, or a more extreme value in the same direction. Finally divide by 2N to convert that into a one-tailed p. In the current example the one-tailed p is 3/16. The two-tailed p is always twice the one-tailed value; here it's 3/8 or .375.

A Larger Example

Suppose we have a sample of 25 cases distributed along a 7-point scale with a neutral point of 4. Suppose the distribution is as follows:

Score     1  2  3  4  5  6  7
Number    1  2  3  5  6  5  3
Then the distribution of absolute deviations from the neutral point is:

Deviation   0  1  2  3
Number      5  9  7  4
There are altogether 220 or 1,0485,576 possible outcomes, yielding Sumpos values from 0 to 35. For the actual outcome in the data set shown, Sumpos = 6 + 2*5 + 3*3 = 25. There are 21936 ways of finding Sumpos = 25, and the numbers of ways of finding Sumpos values of 26 through 35 are respectively 14068, 8408, 4647, 2351, 1075, 435, 151, 43, 9, and 1. The sum of these 11 values (for Sumpos values of 25 through 35) is 53124, so the one-tailed p is 53124/220 = .05066. On my 486 computer, the entire calculation took .03 seconds.

REPTOC's Computing Requirements

As mentioned earlier, REPTOC requires that all absolute distances from M0 be transformed to positive integers, typically by multiplying them by some large positive constant k and then rounding to the nearest integer. Thus if raw data points were numbers between 0 and 10 recorded to 2 decimal places, you could transform them to integers between 0 and 1000 by multiplying them by 100. In that case no rounding would be necessary for the individual scores, but M0 would have to be rounded to the nearest integer on the new 1000-point scale, so that distances from M0 would also be integers. The larger k the less rounding is needed relative to the size of the numbers rounded.

But the larger k the more computing time is needed. As before, let D1, D2, D3, etc. denote the nonzero integerized absolute distances from M0, ranked from small to large. Then computing time is roughly proportional to the sum of the N values of Di(N+1-i). This sum in turn is roughly proportional to kN2, so setting k = 1000 will make computing time roughly 10 times as great as setting k = 100. Further, the number of memory cells (which are typically 8 bytes each) will be at least as large as the sum of the D's, so increasing k also increases the computer's memory requirement.

But neither time nor memory typically represents a severe limitation for REPTOC. If k is chosen to make the largest distance from M0 equal to 1000, and all the distances from M0 are rounded to the nearest integer on this thousand-point scale, then very little rounding error is introduced. The average Di in this problem would typically be below 500. Suppose that average were 500 and N were 50. That's about the largest N for which most investigators would feel the need for an exact test. But the memory required for even this problem would be about 50*500*8 bytes or 200,000 bytes, well within the capacity of any modern computer, whose memories are measured in millions of bytes. And REPTOC analyses of this size execute rapidly; they take about 3 seconds on my 486 computer--the same computer that would have taken 10,000+ years to compute all 250 sums for this problem.

I hasten to note that there are a great many problems in which no rounding at all is needed, and which would run even faster. Chief among them are problems involving simple Likert attitude scales, which might run from 1 to 9 with 5 as "neutral". With such data the neutral point is often M0, so all distances from M0 are automatically integers. Thus no rounding is needed to apply REPTOC to such problems. And if a Likert scale has an even number of points, as with a scale running from 1 to 8 in which 4.5 is thought of as the neutral point, then doubling all distances from the neutral point will make them into integers.

Advantages of REPTOC

As a permutation test, REPTOC gains power by adapting to the particular sample distribution. For instance, suppose you used a 7-point scale centered at 4, but all respondents chose either 1 or 7, so the distribution was dichotomous. Then a sign test (which dichotomizes data) would test the null hypothesis more powerfully than a Wilcoxon or normal-scores test. REPTOC would in this instance yield exactly the same p as a sign test, so its power is also identical. Tests like the sign or Wilcoxon or normal-scores test gain or lose relative power depending on the similarity between the true distribution and the distribution that is implicitly used in the test (dichotomous for sign, uniform for Wilcoxon, normal for normal-scores), but REPTOC adapts to the particular distribution that appears to be present.

A more detailed analysis of the power characteristics of REPTOC remains to be done; I welcome further investigation by anyone.