Lies, Damn Lies, and Statistics
Let me start this blog with one of my pet peeves. I abhor the quote ‘Lies, Damn Lies, and Statistics’. For me, a statistician, it has as much truth as saying that ‘the earth is flat’.
I T I S N O T T R U E ! ! !
The misquote is mistakenly attributed to Mark Twain, who himself mistakenly attributed it to Disraeli. Actually the quote might be from Leonard Henry Courtney, who gave a speech on proportional representation ‘To My Fellow-Disciples at Saratoga Springs’, New York, in August 1895, in which this sentence appeared: ‘After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, “Lies – damn lies – and statistics,” still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.’
If you do quote it (and please don’t quote it to me), then please don’t take it out of context. Make sure that you also say that ‘facts are facts’ and ‘there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.’
For those statisticians, like me, who work in medical clinical trials, the most fiendish, diabolical, dishonest statistician CANNOT lie with statistics. That evil villain must do the analyses as specified in the protocol. MUST!!! I wrote two science fiction novels in which the hero was a cyborg, a human-computer hybrid. Everything the hero said or did was electronically stored. He, like the statistician, must tell the truth. They must do it following the details of the analysis plan. They must stress the key parameters/time periods/comparisons. Tertiary parameters cannot become primary. All of which were specified before the study was unblinded. We must be honest. We cannot lie with statistics. The worst we could do is present the secondary (or tertiary) parameters using the same analyses as the primary. Others might disregard the key comparison and give extensive reasons why failed key analyses should be ‘set aside’ for other analyses or to look at the ‘pattern’ of results, but we statisticians must include the key comparisons.
So, the first reason why the analysis plan (also known as the statistical analysis plan or SAP) is necessary, is that it keeps the industry honest. The Agency can always look at the time-stamp on any analysis plan and determine when it was written/finalized.
About twenty-five years ago, I created my first analysis plan. I had never called it an analysis plan. I had never even heard of an analysis plan. I’m sure one was independently invented elsewhere, before mine. But I invented the statistical analysis plan. Why? I was working alone on a BLA for H-Flu vaccine. I told the medical monitor that I could get the analyses done in an incredibly short time, but I must have buy-in from her and her team. I wrote up my statistical methodology (insertable into the submission), details of how I would proceed, how I would handle problematic data (e.g., values below the limit of detection), and example tables with all statistics I would produce. The team and I spent the better part of a month on the details. Finally the data came in on a Friday, the data manager came in on Saturday to check the last-minute data irregularities, the study was unblinded on Monday, and the medical writer received the full report complete with all tables on Tuesday. The total submission was completed within a week of data availability. This was the first agency approved submission in nine years at Lederle Laboratories and I was the sole statistician on the project.
What did I learn from this?
- Statisticians must get complete agreement with the clinical team before the analysis is done. While I try to hear what my client/team is saying, nothing is as helpful as letting them look at pseudo-output/tables and asking them if this was what they wanted/could use? Showing them what they’ll be receiving saves a lot of re-work. The SAP is a contract with my medical directors and medical writing team.
- I have to add that my worst clients are those who either cannot or will not review the SAP. It is the ‘measure once, cut twice’ approach to timelines. They are never happy with the tables, listings and figures. My favorite clients are those who carefully review and edit each paragraph and re-do all my tables. They get what they want! Everyone is happy at the end of the study (assuming the results are positive).
- Knowing what I need to generate ahead of time allows me (or programmers) to get statistical programs written. A program to do a camera-ready table by a professional programmer has been estimated to take on the average a half day. This is assuming that many tables are very, very similar to one another or the table is a previously written ‘standard’ table. It must be remembered that one must first make sure that the data has been correctly read in, is non-pathological, the analysis is using the correct variable and is doing the analysis correctly with the correct ‘statistical options’ (e.g., the number of decimal places the mean is presented), etc. [Note: I’m not a professional programmer, it takes me longer or I just present the non-camera ready computer output that the program generates.] Many submissions have hundreds of tables. If you have an ‘adaptive’ method of generating tables (i.e., no analysis plan or you are winging-it), you cannot get results quickly. No way. The SAP is a contract with my programmers.
- Its a CYA for the statistician. If a sponsor was to change the SAP at the last minute, the analysis could always be changed. But I do tell them that I will need to re-write the programs, QC the programs, and do all necessary documentation (i.e., I include a document ‘Changes in analysis after the SAP has been finalized’ I also include in the SAP a section called ‘Deviations from the Protocol’). All this work will take extra time, and be an ‘out of scope’. The SAP is a contract with me.
- The protocol states, often in a general way, the planned analyses. But the SAP has all the details. It discusses the mechanics of all the key analyses, all the supplemental analyses (e.g., completer analyses or the non-ITT ‘per protocol’ analyses), all the confirmatory analyses (e.g., non-parametric analysis), all the exploratory analyses (e.g., a model with all non-significant interactions or a model stratified by subgroups), all the expected problems (e.g., missing data, outliers, interactions). That is, how we handle the details of the analysis. The Agency often reviews the SAP just for these details. The SAP is a contract with the agency.
If we want to get the analysis on time and on budget, the SAP is essential.