Common Designs to Reduce Error Variance
This is a very sweet type of study design, but it can be seldom used in pharma or biotech. Devices are another story. With only one group (rather than two) and the statistical properties of the standard error of the difference between means, right off the bat the contralateral trial is one quarter the size of the parallel groups trial. Say we’re doing an acne study. One could treat part of a person (e.g., left side of their face) with one treatment and another part (e.g., right side) for the second treatment. Of course, the treatments would be randomized for side. The main dependent variable would be the difference between the two treatments and one can analyze it using a simple paired t-test. One could be a bit fancier and do it with an ANOVA modeling both treatment, side (and patient). Why is this a good design? Well, lets assume that the location (e.g., left or right side) has miniscule effect. What would go into a person’s treatment ‘A’ measurement at the study’s endpoint? First off, there would be the individual’s characteristics, for example, age, gender, race, in other words their demography. Also would be their predisposition, e.g., their baseline severity, skin oiliness. But when you’re comparing the side A and B, that would be the same, so all the patient’s generic uniqueness drop out when one does the difference. Other factors could be the flair or remission of the acne, also any environmental effects (e.g., for acne: season is a large effect, so is diet, hormone, stress, and other temporal factors). But since that also is the same for both sides, the difference in time and environmental effects is also nullified. Simply put, a contra-lateral design controls for a tremendous amount of between and within patient error variance or noise. The denominator of the effect size, the standard deviation, would be smaller than a parallel group study. [Note: I’ll get to the mechanics in a future blog on ANOVA, but I estimated, from a recent contra-lateral design, that the s.d. from a parallel group trial was a third larger, hence the effect size would be a third smaller. An additional increase of 9/4 or 2.25% in the sample size; a total of 8.75 times smaller number of patients in a contra-lateral compared to the parallel group trial.] Of course, a contra-lateral trial is not possible when a treatment has a ‘central’ component, e.g., when the treatment is absorbed within the body.
Let me talk about a very widely used design for analysis of data – the Crossover Design. The simplest case, the two period crossover, has half the patients receiving one treatment, say Drug A, for a period of time. This period of time is called Period One. Then those patient are allowed to wash out (i.e., not receive any treatment) of a period of time. Finally, in a parallel length of time, Period Two, they receive the second treatment, say Drug B. The other half of the patients receive Drug B in Period One and Drug A in Period Two. As each patient receives both treatments, one could compare the two treatments using the patient as their own control. Of course, since this trial uses repeated measurements, everything mentioned in the previous blog to control for correlated errors (e.g., AR(1)) must be done for crossover trials.
In theory, this will drastically reduce error variance, making the design quite powerful. In theory!
This is a quite dangerous statistical design. The analysis assumes that the patients who receive the treatment in the second period are comparable to themselves during the first period. This is far too often not the case. Patients change over time (disease progression or remission), diseases waxes and wanes over time, measurements/ratings change over time, the patient’s life changes over time, etc. Let me illustrate the issue with a simple example, say we were investigating diabetic foot ulcers. The first period is two months long (Month One and Two) during which the patient receives treatment, followed by a no-treatment, washout period of one month (Month Three), then a second treatment period (Period Two) from months four to five. Are the wounds the same on month one as four. Obviously they wouldn’t be. The amount the patient could improve during Period One is far greater than their improvement during Period Two. Furthermore, as some treatments are more effective than others (which is hopefully the purpose to the treatments), the status of the patients during Period Two will depend on the treatment received during Period One. If, and only if, the patient’s wounds are the same during the starts of Period One and Two (Month One and Four) can the Crossover Design be used.
When one is doing a study of efficacy, patients often do not return to their Period One baseline severity levels at Period Two.
The typical solution to this typical problem? Ignore all post-Period One data and analyze the data as a simple two treatment independent groups analysis. If one had powered the trial for an efficient crossover analysis, then the study would be drastically underpowered for a two-group t-test type of analysis. Hence the study is likely to fail.
When is a crossover design appropriate? One very frequent application is in pharmacokinetic (PK) trials. This is a trial which measures the amount of a drug (and/or its metabolite) in the blood over time and see if a different formulation has a different PK profile. One typically will measure the half-life of a drug, maximum concentration, time at maximum concentration and other PK parameters. The washout is typically at least a week (e.g., at least 10 half-lives of the drug in the body) for all the previous drug to wash out of the body. This can be empirically verified by examining the pre-drug measurement at period two to verify it is zero. That is, no drug or metabolite should be present in each and every patient.
Let me view the crossover trial relative to the contralateral study. The crossover trial controls for patient differences (e.g., demography and baseline severity), which is good. However, the crossover trial does not control for changes over time.
The following is a very common design to also control for patient differences, with none of the problems associated with the patients returning to their baseline, as assumed by the crossover trial.
Parallel Groups Design
Due to the difficulties in running contralateral and crossover trials, the most typical design in biotech/pharma is the between-patient Parallel Groups Trial. In this trial, patients are randomized into as many groups are there are treatments (e.g., two for an active v control trial). But fear not, there are still ways to control for patient ‘noise’.
This type of study, to use a technical term, is HORRIBLE. Here one takes data from an older study and attempts to compare it with results in the current study. Why is it horrible? One must ask: ‘what are the reasons why the two studies could differ?’ Well, there can be subtle measurement differences between the doctors/raters now and then. The populations could differ. Anything could account for the differences, anything. I remember one attempt to use some Swedish no-treatment control data to compare with US active treatment data for an orphan drug. The historical control patients had the disease for a much shorter period of time. They also weren’t as severe. There were also a strong gender difference. Standard of care also changed over time. Any one alone would mitigate the utility of the data. We also tried to select a subset of the no-treatment control data – too few cases could be extracted. If, and only if, all other factors can be found to have negligible effects (please don’t confuse clinical with statistical significant), can one use a historical control. As the list of potential factors is very long (and frequently not measured) and by chance some will be different, there will almost always be alternative explanations to make Historical Controls of very little use.
Analysis of Covariance
As mentioned above, a study in which one adjusts for a patients own variability is a very powerful analysis. Are there other ways to do this? Yes. Yes, indeed.
One very simple way is to look at the change at the key time point relative to their baseline severity (i.e., improvement). Improvement takes into account the patient’s own pre-treatment severity and also controls for the patient’s other unique qualities.
A second, somewhat more elegant technique is something called analysis of covariance (ANCOVA), using the patient’s baseline severity as the covariate. Simply put, instead of analyzing the patient’s key time point scores, one takes them and analyzes the part of the key time point which is unrelated to the covariate, their baseline severity. One can even have more covariates. Covariates are typically selected from the patient’s pre-treatment parameters. That is, they typically include only the demographics (e.g., age, race, gender) and/or baseline characteristics (e.g., baseline acne, skin oiliness) of the patient.
Unfortunately one assumption of ANCOVA is that the covariate doesn’t relate to the treatment. This means that the treatment difference doesn’t depend on the covariate. One can actually test for this by including a treatment by covariate (e.g., treatment difference by baseline severity) interaction. Unfortunately this assumption is often not met. Let me illustrated this with (a) a five point rating (0-no disease, 1-mild, 2-moderate, 3-severe, and 4-life threatening), (b) a completely inert placebo and (c) a perfect active drug. There would be no point in enrolling asymptomatic patients (those with a baseline score of 0). At all four symptomatic baseline severity levels, the placebo patients have no improvement. At each corresponding baseline severity level, the active treatment patients would have a 1 point improvement for the baseline mild patients, 2 points for the moderate patients, 3 points for severe patients, and 4 points of improvement for the life threatening patients. Hence treatment depends on the baseline severity. There is a complete interaction of baseline severity and treatment difference. The bad news is that there can not be a simple treatment difference which is reported. The great news is that although the treatment effect is depends on the baseline severity, this makes perfect sense! One will need to say that the most severe patients have the greatest improvement, and even the mild patients have improvements. At each level of the baseline severity, one can/and should compare the two treatment groups.
I, and most statisticians, use ANCOVA very, very frequently to control for patient variability. There are many, many statistical procedures which allow the analysis (officially called the statistical model by us) to do covariate adjusted analyses.
In my next blog I finally get around to explaining what a t-test and analysis of variance are as well as the core statistic everyone must know about – 15. Variance, and t-tests, and ANOVA, oh my!