I’m surprised I never mentioned randomizations. This step occurs just after a study is designed, important factors are identified by the physicians and after all (or at least most) investigators are identified. [At a minimum, an overestimate of how many centers there will be. Shit happens. Investigators fib in their patient availability. Centers drop out.] If you want a randomized blinded clinical trial, randomized is a key issue. At this time, I’ll also assume blinded. Blinding isn’t always possible (see DNC below), but randomization is always possible. If randomization isn’t used, the study is irrevocably biased. As one critic quipped, “I really liked the paper. Unfortunately someone typed meaningless gibberish all over it.”
Stratification by center – This is extremely useful and is used in almost every trial, with the possible exception of central randomizations. It had been a major concern that there might be very strong regional or investigator biases. For example, let me assume that Center A has very, very ill patients or that in a rating scale Center A’s investigator always rates patients high on the key parameter¹. If all of Center A got the placebo, the study is irrevocably biased. How to easily compensate? Create a randomization only for Center A (and B and C …). Within that center the patients would be equally randomized to the treatments.
Block size – In a randomized trial, sometimes treatment assignment becomes known. I remember a HIV trial where a patient had his treatment analyzed by a lab and returned to the investigator, demanding he be treated with the ‘real’ treatment. Of course, that patient was dropped. Although this should be very rare, some patients might have severe side effects warranting the investigator to break the code to identify what the patient is receiving. On my randomizations, I always provided the center with tamper proof and opaque envelopes for each patient. The decision to Unblind should be done by the company, not investigator.
If we used a pure randomization, it might be possible that all the first 10 patients were given placebo and the second 10 patients got active (see DNC below for a real life example). What statisticians do is block patients. I’ll assume only two treatments, A and B, but this is trivial to expand. To expand to multiple treatments, the block size would be a multiple of the number of treatments. Smallest block-size: Returning to two treatments, the first patient could be randomly assigned to Treatment A, then patient #2 would get Treatment B. However, with the example given above, if patient #1 were unblinded so would patient #2. If #1 were assigned to treatment A, then #2 must be receiving treatment B. A problem. I typically use a minimum block size of 4. The first four patients would be assigned the treatments A, A, B, and B in a random order (e.g., A, A, B, B or A, B, B, A, etc. [there are actually only 6 possible sequences for 2 treatments and four patients]). To avoid the case where the first two patient’s blind were broken I might consider a randomly sized block-size (e.g., first four patients get a block-size of 4, next two get a block-size of 2, next six get a block size of 6; the 4, 2, and 6 were chosen at random). I’ve done that.
The benefit of a small block-size (e.g., block-size of 2) is within each center, the maximum imbalance is 1, one more Treatment A (or B). As each center is randomized that should balance out. On the other hand, if a patient were unblinded the next (or last) patient is also unblinded. A block-size of 4, would have a maximum imbalance of 2, again with many centers, that should even out. A centralized randomization would control the imbalance, make a small block-size reasonable, but might still have the unlikely situation where Center A gets all Treatment A. For this reason, I do not include a description of the block size in a study protocol.
Stratification by other factors – Sometimes there are other important factors, for example disease severity. To make matters simple, you could divide your center randomizations into groups (by block-size). For example, you can assign the first 12 patients to low severity patients, the next 24 patients to medium severity patients, and the next 12 patients to highly severe patients. Note that the number of patients in each of these strata were unequal. I tend to make things a slight bit more complicated on my end by my patient numbering sequence. The first two numbers could be for the center number, the next number(s) for the stratification factor(s) and the last two numbers (assuming each center/strata has a maximum of 99 patients) could be for patient. For example patient 04325, would be the patient number for Center #04, severity 3, and the 25th patient in that Center/Severity. I also often gave the patient number a prefix for the study number.
One can have multiple stratification factors (e.g., gender, ethnicity, left- or right-handedness). But this can very easily become unmanageable. While I like multiple stratification factors, I’d reserve that for very large sized studies. Another point to consider, some statisticians believe “as randomized, analyzed”.
Summary: The end result of a randomization would be a list of patients/strata/patient number/ and treatment. I would recommend doubling (or more) the number of centers and multiply the expected number of patients per center by a factor of at least three. Paper is cheap. On the other hand, the largest center should not be more than three times the size of the average center. One copy (for each investigation) could be sent to the pharmacist at that center/company who creates the treatment for that patient. A copy of that also given (in a tamper proof opaque envelope) to each investigator. Tamper proof and opaque envelopes would also be given for each patient at that center. If there is an independent Data Monitoring Committee, a copy would be given to them as well. A version in both paper and electronic (including a CD disk) would be sent to the company, along with the program and the random seed used. At the end of the trial, all envelopes would be collected and checked if they were opened. The company copy of the randomization would be signed, dated then opened – unblinded, with a key note to the file on the unblinding date.
Democratic National Committee (DNC)
A week ago, the DNC had 20 presidential candidates randomly selected ten at a time. They might have used a bingo ping pong ball to select the first ten. DUMB! REALLY DUMB! One of the complaints was that the two groups of ten were not balanced, with far more top tier candidates on Day 2. What would I have done? Blocked stratification!
I would rank them on one factor, (e.g., average polling rating) – see below for multiple criteria, we would then have twenty candidates in numerical ranks (#1 to #20). I would take the two best candidates (#1 and #2) and for candidate #1 I’d flip a coin: heads for Day 1 and tails for Day 2. If #1 was assigned to Day 2, candidate #2 would be in Day 1, or visa versa. I’d then repeat for #3 and #4, all the way to #19 and #20. If there were an odd number, for example #20 no longer met the cut, #19 could still have a coin tossed to randomly select which day s/he would participate in.
The resultant blocked stratification randomization is fair to all the candidates, but ensures that both days have equally skilled high and low tiered candidates.
Multiple Criteria on the Stratification
What do you do if you had multiple criteria (e.g., poll results and money collected), I’d do a z-score for each parameter, add them up, and use that composite. A z-score z[ij] = ((X[ij] – Mean[j])/sd[j]), where z[ij] and X[ij] are the candidate i’s z- and raw score on a parameter j). A z-score is unit free metric and would allow one to ask if a candidate made more money than their poll results. That is, compare apples and oranges – in this case poll results and money collected. After I got a z-score on each parameter, I’d add them up (z[i*]) and rank on this average rating. [Note: if the DNC thought polling is twice as important, than weight the polling z-score by 2 and the money collected z-score by 1 and to average divide by 3.]
Of course, this composite score is applicable to clinical factors (z-scores for severity and blood hemoglobin).
¹If all of Investigator A’s patients received the same treatment (same for Inv. B, C, …), then one could consider the investigator as the experimental unit, not the patient at the center.