Eat s**t, 300 trillion flies can’t be wrong.

[old joke punchline] “No, I dropped them in that dark alley, but I’d never find them there. That’s why we’re looking under the light post.”

***

I came across a recent rant by a financial consultant (http://www.littlebear.us/wp-content/uploads/ITCI-Little-Bear-July-2015-FINAL-WORD-PDF.pdf) in which they stated a certain stock was a bad idea. The central concept in their ‘post’ was that a small pharmaceutical company should have reported percentage change, because everyone else does. And since they didn’t report percentage change they were hiding something. I don’t know if percentage change is the standard for anti-psychiatric drugs or if the pharmaceutical company was hiding something. Frankly, I don’t care. As I stated in Blog 18, if percentage change was the ‘industry standard’, I would recommend including percentage change only as a tertiary parameter (i.e., present median and no p-values or confidence intervals). If they and the industry like a certain scale (PANSS) excellent. If the raw metric is interpretable see Blog 3 for assessing effect size. If the scale isn’t intuitively interpretable or their study’s mean or sd is idiosyncratic see Blog 4 for assessing effect size.

However, this investment firm imputed a percentage change by computing the average baseline and dividing it into the average change from baseline. Simply incorrect math.

Let me review the pre-algebra you learned in grammar school. You probably remember the cumulative, associative, and distributive laws.

Cumulative law: a+b = b+a or a*b = b*a

Associative law: a+(b+c) = (a+b)+c or a*(b*c) = (a*b)*c

Distributive law: a*(b+c) = a*b + a*c

Let me focus on the distributive law. It works with multiplication, but it DOES NOT work with division. a/(b+c) ≠ a/b + a/c

24/(4+8) = 24/12 = 2, but

24/4 + 24/8 = 6 + 3 = 9

Why is this relevant? Percentage change divides each individual’s change from baseline by their baseline (like 24/4 and 24/8). It is quite different from dividing by the average baseline (like 24/(4+8)).

Let me illustrate the fallacy with a brief example. Say we had a ten point scale and two patients. One patient who was almost asymptomatic (1) at baseline, got slightly worse (-1, he went from 1 to 2), a second patient who was severely ill (9) at baseline improved moderately (3, he went from 9 to 6).

-1/1 = -1.00 (or a percentage worsening of 100%)

3/9 = 0.33 (or a percentage improvement of 33%)

If we averaged the baselines, we would get an average baseline of 5. If we averaged the changes from baseline, we would get an average change from baseline of 1. Average percentage change from baseline/Average baseline = 0.2, an improvement of a fifth of a point, a pseudo percentage improvement of 20%.

The average change from baseline is -0.333, a worsening of a third of a point or a percentage improvement of MINUS 33.3%.

In sum, it is mathematically incorrect to compute percentage change by dividing an average change by an average baseline. I don’t care if you have no other way to compute average percentage change, it was wrong. Just ask your 5th grade son. <rolling his eyes> “Oh, Dad!”

Dear Allen,

thank you for this example. It is quite clear that it is mathematically incorrect. On the other hand, your example takes completely opposite baselines. I believe in almost any study you want to have your patient population with similar baseline values, don’t you? In this case, although it is still mathematically incorrect, would it be a plausible approximation?

Here is an example from a study (assessment of intensity of a symptom, VAS):

Day 0 Day 4

Active (n=102) 3.72+/-2.59 2.02+/-1.62

Placebo (n=106) 3.35+/-2.58 2.67+/-2.14

p value 0.23 0.03

I gather that saying that in active group there was a reduction of intensity of 45% from baseline is incorrect (1-2.02/3.72)?

How would you interpret these results?

Yes, my example is cooked. However, any set of numbers would prove the mathematics of my statement. It also illustrates the strong influence of low baseline values on percent improvement, often producing outliers.

Yes, you are totally correct that you want patients with similar baselines. Unfortunately, that is often not seen. For example, in your case, didn’t you see some VAS scores near zero and some at twice the baseline means? We generally limit the baseline severity (e.g., Inclusion Criterion 3: All patients must have a VAS score of 3 or greater and no one has life threatening VAS scores [VAS >8]). Furthermore, we should expect through randomization, the baselines of randomized patients should be identical. I often suggest a stratified randomization on the baseline severity. [Note: we often don’t compute p-values on baseline parameters (e.g., gender, race, baseline severity). With enough parameters tested, S**T happens and you have to explain them away. I typically exclude such tests in my SAP, although I might have a ‘back pocket’ analysis of baseline p-values. If you stratified on important baseline characteristics (e.g., baseline), then it won’t be statistically or clinically significantly different.]

Yes, I am saying you CANNOT compute a percentage improvement (45% is incorrect) by examining mean change divided by mean baseline.

My Suggestion: As I said in Blog 18, test the change from baseline using ANCOVA with the baseline as covariate. Present the ANCOVA’s least squared mean change with its 95% CI as well as the effect size. The least squared mean change will adjust for baseline differences. [Note: Make sure the adjustment is at the study’s pooled baseline mean, not zero – often the default.] If you want, present the median percent change from baseline (but no inferential statistics on percentage change, including CIs).

Interpretation: First, you have a fairly good sample size to detect a moderately sized clinical effect. I don’t know where the p-values come from but I would conclude that there is no evidence that the baselines (which theoretically should be the same) are different from one another. Furthermore the Day 4 means are likely (statistically significantly) different by some amount. If the VAS is meaningful, I would focus on the mean difference (2.67 – 2.02 = 0.65) with its 95% CI. Otherwise I would present the effect size. [See Blogs 3 and 4.] Most importantly, is the treatment effect (e.g., 0.65 VAS points) clinically meaningful?

One other set of observations on your data. I’m not sure what the numbers following the +/- are (e.g., 2.59). I would guess CI or Standard Errors. In either case, they are proportional to the standard deviations, when the Ns are close (102 and 106). The baseline variances do not appear different (for F test comparing variances the p ~ 0.15). However, the Day 4 are likely statistically significantly different (p ~ 0.003), with the placebo variance approximately 75% larger than the active variance. This might be due to a ‘floor effect’ of your VAS in the active group. Furthermore, the VAS variability at Day 4 was more than twice the variability seen at Day 0.

Thank you Allen! This is very helpful!