Yesterday was 1 degree Fahrenheit and today is 10. I’m ten times warmer!!

Compassion? We statisticians have evolved beyond such petty human affects.

I received the following question from Simon Wilkinson from New Zealand:

Dear Allen. To set the scene, I am not a stat or a biostat. We are treating patients with secondary progressive multiple sclerosis on a “compassionate basis” with an experimental drug – something that is allowed in our country (NZ). The number for patients is very small, about 15. Each patient is their own unique set of symptoms. We are using a MS specific QoL patient reported questionnaire (the MSQLI) to obtain baseline and then 3 monthly data as one means of gauging treatment effect in the absence of biomarkers – one of the challenges of treating this indication. We have been looking at the effect in each patient by using PCFB. In some components of the MSQLI a reduction in score is improvement and in other components, an increase in score is improvement. As a lay person an immediate issue arises. A baseline score of 1 (bad) verses a 3 mth score of 7 (much improved) equals a PCFB of 600%. For a different component a baseline of 7 (bad) verse a 3 mth score of 1 (much improved) equals a PCFB of -85%. This seems wrong! Subsequent ‘googling’ on the issue reveals the apparent minefield of PCFB!! Fundamentally we are interested in how treatment is impacting each patient as opposed to an overall effect in a larger population. Can you suggest an appropriate approach. Sincere thanks.

First off, a ratio is only meaningful when zero means zero. I had forgotten to include this assumption when I wrote Blog 18. For a percent change from baseline [100*(X{Month 3} – X{Baseline})**/X{Baseline}**], the baseline must be on a scale where zero is zero, a complete absence of the attribute (e.g., no debilitating effects of the disease). If your scale can be transformed so a 0 is ‘free of illness’, then you can compute a ratio. Therefore a PCFB can be done for Heart rate, Weight, and Height. But a PCFB cannot be done for temperature on a Celsius or Fahrenheit scale, but could be done for a Kelvin or Rankine temperature scale. So if, and only if, a 7 of much improved means free of illness then you could do x’ = 7 – x, where x’ is a new score and x is the old score. However that is NOT possible with your QoL scales. This is the root of the discrepancy of your 600% and -85%.

I would recommend using a simple change from baseline and forget the percentage part, see my blog for other reasons to avoid it. In reporting your results, I recommend you present the mean difference between the Month 3 and Baseline results, and present the CI of the difference. If the CI excludes zero and was positive, then you can say it was ‘statistically significant improvement’. Alternatively you could do a paired t-test and get identical p-value results.

To make your presentations easier, one can always do a linear transformation of the form X{Transformed} = a + bX{Original} on any data you report means or medians. Such linear transformation have NO EFFECT on p-values. Means, medians, and Confidence Intervals are unaffected except that they will be transformed using such an identical transformation. For example, you can always transform a proportion (a mean of 0, 1 data) into a percentage by multiplying it by 100. Correlations are totally unaffected. S.D. or standard errors would be affected by SD{transformed} = b*SD{Original}, variances would be a squared ‘b’. As at least one of your component is a reflection of the other I would do the transformation: X’ = 8 – X. You would then say “The scales were reflected so a positive number indicated improvement.”

To put it in context to Class 1 of the Statistics 101 class: For Ratio level data (e.g., inches to centimeters), you can always do transformations of X’ = bX. For Interval level data (e.g., Fahrenheit into Celsius), you can always do transformations of X’ = a + bX. For ordinal data, you can always do a monotone transformation, such that increasing X will produce an increased X’. For nominal data, any transformation is possible.

Finally, I LOVE the idea of providing experimental medications to patients on a compassionate basis. Providing patients with an opportunity to be treated with a novel compound, which may be of unique benefit to them, is fantastic. This is even more important for patients who are non-responders to the more traditional treatments. I’ve been involved with a large number of such compassionate protocols (e.g., the use of an blood substitute to treat Jehovah’s Witness patients, who might die facing a major surgery otherwise). I was proud to be assisting that company. So congratulations on doing such compassionate treatment.

However, as a statistician (and this is a biostatistics blog), the evaluation of efficacy is frankly a waste of time. I’ve observed that the data is typically much messier than for ‘official Phase x’ studies, no offense to your noble treatment of these very ill patients. Such studies often have greater proportions of missing data, sloppier visit windows, small samples sizes, and poorly written protocols, especially wrt objectives and how the data will be analyzed. Big pharma often reports only the raw data and not present any summaries of efficacy for compassionate protocols. If they do provide analyses, it tends not to be p-values, only descriptive statistics. Even if your protocol was the best written/executed protocol there was, compassionate-usage protocols primary fault is they are non-randomized, non-comparative, open-label study designs. If there are comparative groups, the groups often differ on a very large host of baseline demographics/medical conditions. There are a host of reasons/biases which make the meaning of “patients experienced a statistically significant increase in their mean change from baseline” uninterpretable. This includes: spontaneous remission, wanting to please the helpful doctor and staff, cognitive dissonance, progressive nature of the disease, time of year, scale interpretation changing over time, … In other words, such studies are little more than testimonials. The Integrated Summary of Efficacy will possibly link to their report, but exclude them in any summary analysis. That is, the ISE will ignore them. The analysis of safety is typically a sub-population in the Integrated Summary of Safety, an ignored side-note in the ISS. You’re lucky that the experimental drug manufacturer is allowing your compassionate-use protocol. Most pharma executives consider it a waste of their resources (money), although they might benefit from the PR.

At least that’s my opinion/observation.

I found a good journal article related to this topic.

http://www.biomedcentral.com/content/pdf/1471-2288-1-6.pdf

Ha saw you already discussed this article in post 18.

Hi Allan,

Your blog is great – I can skim over the very technical pieces and still understand the take-home messages! I work in a place where anything more complicated than “% change” would be hard to get traction. I have 25 pre-post measures that measure a varity of mental health domains. I need to know which of these measures show more change than the others. Unlike the example above, with one measure, they do not all total the same amount. For some, scores can range from 0 to 6, for others 0 to 32. Any suggestions?

Quick reply. See my blog #4, where I covered this (4. Meaningful ways to determine the adequacy of a treatment effect when you lack an intuitive knowledge of the dependent variable).

I talked about the effect size, the mean difference divided by the standard deviation. You would then look at the parameter with the largest effect size. However, as these variables are randomly distributed, i.e., with error, on replication the ‘best’ might not stay the best. I recommend you examine the effect size AND ITS CONFIDENCE INTERVAL. Your parameters are very likely to overlap.