What is your right shoe size? What is your left shoe size?

How many horses? Simple, you count the number of hooves and divide by four.

A man with one watch knows what time it is. A man with two watches is never sure.

***

A colleague asked me to review a Statistical Analysis Plan (SAP) he inherited. I will simplify it and obfuscate the details to protect the guilty.

In it there was a primary and two secondary parameters. They were “Number of Headaches from Week A to Week B”, where A & B were a) 2 & 7 [the primary parameter], b) 1 & 7 [first secondary parameter], and c) 2 & 8 [second secondary parameter].

I can imagine the origin to be something like this:

Stat Consultant: You want to look at number of headaches. From when to when? We need to put that into the SAP.

Medical Monitor: Uh, you need to know that? Can’t we just add that in later?

Stat Consultant: For the key parameter? No. You should have had that in the protocol

Medical Monitor: OK, let’s make it from the beginning to the end of the study.

Stat Consultant: {Sigh} You’re giving the drug at Day 1, right? The key benefit, the main metabolite, should be barely measureable by Day 4? And it says that the half life is about a week and a half?

Medical Monitor: Yeah

Stat Consultant: OK, it should have a peak at Week 2, although by Week 1 a good part should be there. By Week 4 more than half should be gone, by Week 5 it should be down to a quarter, by Week 7 about an eighth. Et. cetera. Does that help?

Medical Monitor: More than an eighth sounds a bit low, so why don’t we go from Week 2 to 7. Wait a second, someone will ask about Week 1 too. Hmm, we should have some coverage for later too.

Stat Consultant: You gotta pick one.

Medical Monitor: I don’t know. What if I pick wrong?

Stat Consultant: You gotta pick one.

Medical Monitor: Don’t rush me. 1 to 7. No, 2 to 7. Yeah, 2 to 7. Hey why don’t we pick more, just in case?

Stat Consultant: You could do that for the secondaries, but the primary should still be only one.

Medical Monitor: OK, 2 to 7 is the primary and 1 to 7 is a secondary and so is 2 to 8.

Stat Consultant: You’re the boss.

If one were to examine any two of the above parameters, one should expect a very, very high correlation between them. For example, take the relationship between the first (a) and second (b) parameter above. Let X be the number of headaches from Weeks 2 to 7 (Part) and Y be the number of headaches from Weeks 1 to 7 (Total). Obviously Y = X + the number of headaches from Week 1. The correlation of X and Y is the correlation of X to X plus the small component from Week 1. The correlation of a variable with itself (Part with Part) is 1.0. Therefore the correlation of X with Y must be quite high, since Y is mostly X plus a smidgen of something else. Statistically this is known as a part whole correlation. Even if the correlation of the two unique components, number of headaches from Week 1 (let’s call the unique part ‘Q’) and Weeks 2 to 7 (let’s call the part ‘P’), was zero the correlation of the part with the whole would still be high. When r_{PQ} = 0, then r_{P,P+Q }or r_{P,T} would be equal to

Where σ is the standard deviation, σ^{2} is the variance,

P is a part of the total (e.g., Weeks 2 to 7),

Q is the other part, the part of the total unique from P (e.g., Week 1), and

T is the total (e.g., Weeks 1 to 7). T = P + Q.

As the unique part (Q) is small relative to the remainder, this correlation must be much larger than 0.50. Even if we assume a zero correlation of week 1 to the remaining 6 weeks, and the variances proportional to time only, then the correlation between the first and second efficacy parameter is expected to be r_{P,T} = √(6σ^{2}/(6σ^{2} + 1σ^{2})) = √(6/7) = 0.93.

Furthermore, medically we would expect that the correlation of the part and the unique component (r_{P,Q}) to be positive, further increasing the r_{P,T} correlation beyond 0.93. They are measuring the same thing after all! Therefore, we should expect the correlations among these three ‘primary’ and ‘secondary’ efficacy endpoints to be quite high, e.g., > 0.90.

When you have such a high correlation you are getting almost identical information multiple times. They aren’t asking two or three things, they are giving you the same information three times, like asking about your left and right shoe sizes. I’ve seen this problem often, especially with Quality of Life questionnaires with their subscales and total scores. With the QoL scales and total, if the subscale was half the total, the correlation (assuming the subscales were uncorrelated) between the subscale and the total must be 0.7. If the subscales are measuring quality of life, hence correlated, it should be higher.

I should point out that the means will differ, the standard deviations will differ, but the information will be the same. What do I mean? Say you asked how horses were in a herd of stallions. Then asked how many hooves they had. The second will be four times the first (if you counted correctly). The mean should be four times greater, so would the standard deviation, but the correlation between the two should be 1.00. Any inferential analysis (p-values) or effect size of one would give you identical answers to the other, except for the means or s.d. would differ by a constant factor of 4.

Suggestion: When you have redundant parameters, make your life easier and **eliminate the redundancies**. It shouldn’t make any difference in your conclusions. It will save trees, focus your report, and save analysis costs. Remember, even if very similar parameters can easily be analyzed in an almost identical manner, every parameter needs to be independently QCed (size in inches of output is proportional to cost). How do you pick? Run a pilot trial and select the best parameter (highest effect size or largest statistical test [e.g., chi square or t or F] value). Otherwise, use the literature or your intuitive guess. Otherwise, pick the largest numerical parameter (e.g., Weeks 1 to 8; Total QoL score).