A colleague asked me to review a trial. I will mask the identity of the trial and obfuscate irrelevant details, like the disease, timing, treatment, and key parameter.
The patients had abnormal parathyroid glans, with hypercalcemia. This was a Phase IV study, meaning that the drug (Calcenese) was approved by the agency and the results were exclusively oriented for marketing the compound. There was a four week screening/run-in period, followed by a randomization. Patients were required to have an abnormal PTH level and serum calcium > 11.5 mg/dL at baseline (following the run-in and before the randomization). Treatment commenced the following day. It was an open-label, randomized study with 2 dose levels: od (once a day) dose of 2 tablets of Calcenese and bid (twice a day) dose of 1 tablet of Calcenese. That is, both doses were the same number of tablets per day. At Week 8, based on the serum calcium level, the patients might receive double the above number of tablets. In other words, the doctors were allowed to titrate the treatment. The key analysis was the change from baseline calcium levels at 30 weeks within each randomized treatment regimen, although an earlier interim analyses (Week 15) was also planned. No treatment comparison of the od vs bid regime was planned.
The question which was asked by the client was if they needed to ‘pay’ for an interim analysis alpha level (see Blog 17. Statistical Freebies/Cheapies – Multiple Comparisons and Adaptive Trials without self-immolation). The interim analysis was the data at Week 15 (i.e., prior to the availability of the Week 30 data).
Open-Label: When I first heard about the trial being open-label (i.e., investigators and patients aware of the treatment, I initially thought it was hopelessly flawed. However, on secondary reflection, since serum calcium is a totally objective laboratory test, I was somewhat mollified. Yes, one might say it is unlikely that the investigators, (staff), and patients could directly influence the measurements. Nevertheless, there might be more subtle biases due to the open-label nature of the trial. Some of these potential biases might include: differential patient selection, patients opting out prior to their first treatment, differential drop-out rates, etc. Most of these might be discounted as the different treatment arms had identical dosages and all patients were treated.
Regression to the mean: A more subtle, and more likely source of bias in analysis of the change score, is possible natural day-to-day variability in the patients and accuracy of the laboratory test. If a patient’s observed score can be given as X = μ + e, where μ is their true baseline serum calcium level and e is the sum of the patient’s natural variation and the laboratory error in assessment. If the patient error (e) were unusually high at baseline then if the patient had another baseline assessment it is likely to be lower. Similarly if e were unusually low, then a second baseline assessment might be higher. In both cases, one would expect their replication score (X) would be closer to their true value (μ). In statistics, this is referred to as regression to the mean. As there is a requirement for the baseline serum calcium to be > 11.5 mg/dL, then those included in the study are expected to have a μ baseline which would be expected to be lower. In other words, the change from baseline is expected to be biased in a positive direction.
Change from Baseline Bias (quasi-experimental design): A larger bias is due to a patient being actively treated. People are changed by that! They were aware that their serum calcium was high, then they were told that it was > 11.5 mg/dL, severe enough to be admitted into the trial. How would many people react? Perhaps diet to lose weight, perhaps diet totally avoiding calcium rich foods, perhaps exercise, … The gold standard for clinical trials is the placebo controlled study, with the key comparison the placebo v active difference. Why? Let me tell you of my first professionally analyzed study. It was a change from baseline analysis. The placebo had a 7 point (statistically significant from 0) change. Fortunately the active had a 14 point difference with was different from 7. Morale: A significant change from baseline is seldom a valid result. Hence the key analysis for this trial is frankly not credible, without comparison to a credible reference group (there was none for this trial).
Interim Analysis at Week 15: Let me return to the question of the Week 15 interim analysis. Does one need to ‘pay’ for doing both the interim and final analysis? The simple answer is no. The Week 15 and Week 30 serum calcium are two different parameters. If the key parameter is Week 30, and Week 15 is secondary, then one need not ‘pay’ for doing the interim analysis.
Multiple Comparisons: However, the actual analysis was to analyze patients who were randomized to the od and bid regimens. There were two regimens. Hence there was two significance tests, not one. There was two ways in which a statistically significant change from baseline could be seen: once for the od regimen and once for the bid regimen. Therefore, using a Bonferroni adjustment, a 0.025 alpha level would be used. In this case, the client was doing a two sided confidence interval, so each should have been a 97.5% CI or 0.0125 on each of the four sides of the CI.
Could they do two 95% CIs instead? Well, this is a Phase IV trial. They can try, and if the referee doesn’t comment about it … If they do comment then they can pool the data of both regimens for a pooled Calcenese treatment for a single 95% CI. A Phase III trial for the FDA will have rigorous statistical review, a journal reviewer seldom is a statistician and would prefer the more familiar 95% CI.
Drug Titration Study: Let me tell you a story of a dose titration study I did 35 years ago. Most investigators in this study didn’t do any titration. Only one investigator actively titrated. Fortunately he also enrolled the largest number of patients. I analyzed him alone. For each study week he either increased the dosage if the patient wasn’t doing well or decreased the dosage if the patient improved. For that investigator I observed that patients who had a high dosage had little improvement and those who had a low dosage had the most improvement. Think about it – this should have been the expected result! The naive conclusion would be treatment was harmful and little/no treatment was beneficial. From that analysis onward I did not allow my clients to do an active drug titration study ever again.
The secondary analysis for the interim was for the patients with each dosage regimen. Remember, patients were to be titrated to receive 2 tablets daily up to Week 8, then they could receive 4 tablets for either od or bid regimen. Therefore, there could be 4 treatment groups (2 tables od, 1 tablet bid, 4 tablets od, and 2 tablets bid). As those patients who were allowed to double their titration dosage are likely to have higher serum calcium levels, their change in baseline must be larger than those who didn’t double their dosage. Most statisticians very strongly avoid ALL grouping based on POST-BASELINE (e.g., Week 8) data.