REPTOC stands for "rapid exhaustive permutation test on a center." It tests the same null hypothesis as the Wilcoxon signed-ranks test--the hypothesis that scores on a single variable are distributed symmetrically around a specified center. Rejection of this null hypothesis allows more specific conclusions than is often realized. REPTOC requires only a miniscule fraction of the computer time that similar exhaustive permutation tests would require--seconds rather than years.

"But what can I conclude if the null is rejected? It might mean that scores are
distributed symmetrically around some other value, or it might mean that the distribution is
asymmetric, possibly with a mean or median (or both) at M_{0}. Therefore I
can't really reach any useful conclusion."

This concern can be addressed through the concept of *partial midranges*.
A population's midrange is the mean of the highest and lowest scores in the population.
Define the Kth partial midrange as the average of the distribution's K percentile point and its
100-K percentile point. For instance, the fifth partial midrange is the average of the 5th and
95th percentile points, and the 50th partial midrange is the median. Even with no
assumption of symmetry, a positive rejection of the Wilcoxon hypothesis (that is, a rejection
with the rank-sum of positive deviations from M_{0} greater than the rank-sum
of negative deviations) shows that at least some of the distribution's partial midranges must
exceed M_{0}. Thus there is at least some sense in which the distribution is
centered above M_{0}. Therefore rejection of the Wilcoxon hypothesis can
lead to a useful and interesting conclusion. The general nature of this argument was
suggested by Pratt and Gibbons (Concepts of Nonparametric Theory, Springer-Verlag, 1981,
p. 155), though the term *partial midrange* is my own.

An even stronger conclusion can be reached. Once you have used a Wilcoxon or
other test to show that a single partial midrange is, say, above M_{0}, often the
most parsimonsious conclusion (though certainly not the most conservative conclusion) is that
all partial midranges fall on the same side of M_{0}. That would imply that the
mean, median, and all reasonable measures of central tendency are on that side of
M_{0}. Therefore a rejection of the null by the Wilcoxon or other test changes
the most parsimonious hypothesis from the null to being the hypothesis that all reasonable
measures of central tendency are on the same side of M_{0} as was indicated
by the test.

The Wilcoxon, normal-scores, and sign tests also lose power by losing information.
For instance, the Wilcoxon test ranks data points by their absolute distances from
M_{0}. If three successive absolute distances were 14, 15, and 20, the test
would simply assign adjacent ranks to those points, ignoring the fact that 15 is much closer
to 14 than to 20. This loss of information generally produces a loss of statistical
power.

The Wilcoxon and normal-scores tests also don't deal well with ties, which are very common in a wide range of problems. For instance, if 20 people respond to a 7-point attitude scale, the number of ties is necessarily large.

The variable may be either absolutely continuous, allowing scores like 7.3265, or
may be discrete such as number of offspring or score on a 7-point attitude scale. In a
permutation test of the Wilcoxon hypothesis, one can ignore any data points that fall exactly
at M_{0}, so I shall define N as the number of *other* points. For
example, suppose 5 data points are 14, 20, 24, 29, and 40, and you want to test the null
hypothesis M_{0} = 20. Ignoring the value of 20, we compute the differences
from 20 of the other data points, finding differences of -6, +4, +9, and +20, which sum to
27. We then examine all 2^{N} sums that could be computed by independently
changing the sign of each of these differences. For instance, making all 4 signs positive
yields a sum of 33, while making just the first two positive yields a sum of -19, and so on.
Then the proportion of these sums that are equal to or greater than the observed sum of 27,
can be interpreted as a one-tailed *p* giving the probability that one would have
found such a high sum just by chance, given the set of absolute distances from
M_{0}.

The exhaustive permutation test considers 2^{N} different sums. Since
this number doubles for every one-unit increase in N, it may be impractically large even for
moderate N. For instance, if N = 50 then 2^{N} exceeds
10^{15}; based on trials with my own 486 computer, I estimate it would take
that computer over 10,000 years to compute all these possible sums. Even a million-fold
increase in computer speed (which is not expected in the foreseeable future) would reduce
this to about 4 days, which is still too long for routine use. And a 20-unit increase in N would
still increase the required time by a million fold.

A nonexhaustive permutation test samples randomly
from the population of possibilities, but this can still take substantial computer time, and also
has the obvious disadvantage that two analysts will find different *p*'s even when
working with the very same data.

REPTOC is a very minor modification of the exhaustive permutation test that reduces
computing time manyfold. For instance, a problem like the one just described took 3
seconds rather than 10,000 years. REPTOC's principal computing requirement is that all
absolute distances from M_{0} be rounded to positive integers. If desired, all
the distances may be multiplied by a large positive constant to minimize the importance of
the rounding. REPTOC's computing requirements are discussed later in more detail, but
they are nearly always modest.

2-2+3-5 = -2

2+2-3+5 = 6

2+2+3+5=12

But any sum like these is easily calculated from the sum of the positive entries. Let
*Sum* denote the algebraic sum like those illustrated, and let *Sumpos,
Sumneg, *and *Sumabs* denote respectively the sum of positive values, sum
of negative values, and sum of absolute values of the *N* distances. Then

Sum = Sumpos + Sumneg

Sumabs = Sumpos - Sumneg

The last equation can be rearranged to read

Sumneg = Sumpos - Sumabs

Using the right side of the last equation to replace Sumneg in the expression for "Sum" gives

Sum = 2 x Sumpos - Sumabs

Changing the signs of the distances from M_{0} doesn't change
*Sumabs*, so it is constant for all the rows. Therefore the last equation shows that
ranking the rows by their sums (or equivalently by their means) is equivalent to ranking them
by their *Sumpos* values--the sums of their positive entries. But it turns out that
there is a simple algorithm for finding the number of rows with each possible
*Sumpos* value. I illustrate with the same problem, in which N = 4 and the
absolute deviations from M_{0} are 2, 2, 3, 5. More generally, let
D_{1}, D_{2}, D_{3},... represent the entries in
such a row. The D's need not be ordered, though computing time is least if they are ordered
from low to high.

- Create a "row" V0 (that's a zero) containing the single entry 1.
- Create a row V0S, where the S denotes "shifted". Row V0S is row V0, with
D
_{1}zeros attached to the left end. In the current example, D_{1}= 2 so V0S = 0 0 1. - Add rows V0 and V0S; call the new row V1. In the example, V1 = 1 0 1.
- Create a new row V1S by adding D
_{2}zeros to the left end of row V1. In the example, V1S = 0 0 1 0 1. - Add V1 to V1S to create row V2. In the example, V2 = 1 0 2 0 1.
- Keep repeating this process until all D's are used and the final row number is numbered N.

V0 1 V0S 0 0 1 V1 1 0 1 V1S 0 0 1 0 1 V2 1 0 2 0 1 V2S 0 0 0 1 0 2 0 1 V3 1 0 2 1 1 2 0 1 V3S 0 0 0 0 0 1 0 2 1 1 2 0 1 V4 1 0 2 1 1 3 0 3 1 1 2 0 1The final row always has DS+1 entries, where DS is the sum of the N absolute distances. In the current example DS = 2+2+3+5 = 12, and V4 has 13 entries. Label these entries 0 through DS, as shown below, where

Sumpos 0 1 2 3 4 5 6 7 8 9 10 11 12 Numways 1 0 2 1 1 3 0 3 1 1 2 0 1

This table shows the number of ways of finding each possible sum of positive entries. For instance the table shows there are two ways of assigning signs to the numbers 2, 2, 3, 5 so that the sum of positive entries is 10; they are -2 +2 +3 +5 and +2 -2 +3 +5. There are 3 ways of making the positive entries sum to 7; they are

+2 +2 +3 -5 +2 -2 -3 +5 -2 +2 -3 +5The logic behind this method is essentially the same as that used in a computing algorithm for the Wilcoxon test, which is explained elsewhere.

Like the other unshifted rows (V0, V1, V2, etc.), the final row is always
symmetrical. Identify the observed sum of positive entries; suppose here it's 2. Then sum
the values in the "Number of ways" row, from that entry out to the nearer end. In this
example we have 2+0+1=3. That's the total number of ways one could have found the
observed number of positive entries, or a more extreme value in the same direction. Finally
divide by 2^{N} to convert that into a one-tailed *p*. In the current
example the one-tailed *p* is 3/16. The two-tailed *p* is always twice
the one-tailed value; here it's 3/8 or .375.

Score 1 2 3 4 5 6 7 Number 1 2 3 5 6 5 3Then the distribution of absolute deviations from the neutral point is:

Deviation 0 1 2 3 Number 5 9 7 4There are altogether 2

As mentioned earlier, REPTOC requires that all absolute distances from M_{0}
be transformed to positive integers, typically by multiplying them by some large positive
constant *k* and then rounding to the nearest integer. Thus if raw data points were
numbers between 0 and 10 recorded to 2 decimal places, you could transform them to
integers between 0 and 1000 by multiplying them by 100. In that case no rounding would be
necessary for the individual scores, but M_{0} would have to be rounded to the
nearest integer on the new 1000-point scale, so that distances from M_{0}
would also be integers. The larger *k* the less rounding is needed relative to the
size of the numbers rounded.

But the larger *k* the more computing time is needed. As before, let
D_{1}, D_{2}, D_{3}, etc. denote the nonzero
integerized absolute distances from M_{0}, ranked from small to large. Then
computing time is roughly proportional to the sum of the N values of
D_{i}(N+1-i). This sum in turn is roughly proportional to
*kN ^{2}*, so setting

But neither time nor memory typically represents a severe limitation for REPTOC. If
*k* is chosen to make the largest distance from M_{0} equal to 1000,
and all the distances from M_{0} are rounded to the nearest integer on this
thousand-point scale, then very little rounding error is introduced. The average
D_{i} in this problem would typically be below 500. Suppose that average
were 500 and N were 50. That's about the largest N for which most investigators would feel
the need for an exact test. But the memory required for even this problem would be about
50*500*8 bytes or 200,000 bytes, well within the capacity of any modern computer, whose
memories are measured in millions of bytes. And REPTOC analyses of this size execute
rapidly; they take about 3 seconds on my 486 computer--the same computer that would have
taken 10,000+ years to compute all 2^{50} sums for this problem.

I hasten to note that there are a great many problems in which no rounding at all is
needed, and which would run even faster. Chief among them are problems involving simple
Likert attitude scales, which might run from 1 to 9 with 5 as "neutral". With such data the
neutral point is often M_{0}, so all distances from M_{0} are
automatically integers. Thus no rounding is needed to apply REPTOC to such problems.
And if a Likert scale has an even number of points, as with a scale running from 1 to 8 in
which 4.5 is thought of as the neutral point, then doubling all distances from the neutral
point will make them into integers.

A more detailed analysis of the power characteristics of REPTOC remains to be done; I welcome further investigation by anyone.