Let me again start with a truism,
Failure to reject the null hypothesis is not the same as accepting it. One can ONLY reject the null hypothesis.
To many, failure to reject the null hypothesis is equivalent to saying that the difference is zero. This is absurd. It is wrong. As I’ve said previously, inability to reject the null hypothesis directly implies that the scientists had utterly failed to run the correct study, especially with regard to doing an adequate power analysis. To say it directly:
Failure to reject the null hypothesis means the scientists were INCOMPETENT. Failure to reject the null hypothesis does NOT MEAN THE DIFFERENCE WAS ZERO, only that the difference might be zero, along with an infinite number of non-zero values, some of which might be clinically important.
To repeat my conclusions about testing the null hypothesis from my second blog, I summarized:
In my previous blog I said that the p-value, which test the null hypothesis, is a near meaningless concept. This was based on:
In nature, the likelihood that the difference between two different treatments will be exactly any number (e.g., zero) is zero. [Actually mathematicians would say ‘approaches zero’ (1/∞), which in normal English translates to ‘is zero.’] When the theoretical difference is different from zero (even infinitesimally different) the Ho is not true. That is, theoretically the Ho cannot be not true.
Scientists do everything in their power to make sure that the difference will never be zero. That is, they never believed in the Ho. Scientifically, the Ho should not be not true.
With any true difference, a large enough sample size will reject the Ho. Practically, the Ho will not be not true.
We can never believe (accept) the Ho, we can only reject it. Philosophically, the Ho is not allowed to be true.
the Ho is only one of many assumptions which affect p-values, others include independence of observations, similarity of distributions in subgroups (e.g., equal variances), distributional assumptions, etc. We have trouble knowing if it is the Ho which isn’t true.
Why do I keep on ranting? I received an e-mail which referred to a Lancet article (http://www.thelancet.com/journals/laneur/article/PIIS1474-4422%2810%2970107-1/fulltext). The authors of the Lancet article stated “There were no differences in intellectual outcome, subsequent seizure type, or mutation type between the two groups (all p values >0·3).”
I replied to the e-mail questioner with the following
A few comments:
- You mentioned that N=40 (actually N < 40), but failed to mention that the key group (Vaccination proximate) had an N of 12. Furthermore, power is further decreased by a ratio of Ns > 2 (28/12=2.33). Power is highest when the Ns are equal and decrease as N decrease relative to one another.
- The key statistic is percentages. Percentages are the weakest statistics by which one can achieve statistical significance. For example, Regression has a 14% treatment difference (95% CI on this difference [-47%, 18%]), yet the p-value appears to be p=0.5. This lack of ability to reject the Ho is no doubt due to the N=12 also. One change in patient status in regression in the small N group produces a (1/12=) 8.3% percentage difference.
- “There were no differences in …” is patently false. What was actually meant was that there were no differences which achieved statistical significance. The regression difference (p = 0.49) was 50% – 36% = -14%. This is a numeric 14% difference. The 95% CI on the difference in proportions with an Exact approach is (-47%, +20%). Yes, this can not be distinguished from a 0% difference, or a -47% difference, or +20% difference. I would doubt that any clinician would regard a 47% difference as clinically unimportant.
- Many, many people do not differentiate between the inability to reject the null hypothesis (difference could be zero), with the notion that the difference IS zero, e.g., the authors of this Lancet study. Inability to reject the null hypothesis is typically due to the scientist’s inability to run a valid study. The scientist(s) who wrote the Lancet article failed. They failed to collect sufficient data. The null hypothesis is never true! It is a straw argument. See my blog 1. Statistics Dirty Little Secret (http://allenfleishmanbiostatistics.com/Articles/2011/09/statistics-dirty-little-secret/) and blog 8. What is a Power Analysis (http://allenfleishmanbiostatistics.com/Articles/2011/11/8-what-is-a-power-analysis/).
I agree with the commenter of the Lancet article who said that this study was incapable of differentiating with zero, due to the author’s inappropriate study design, especially in collection of insufficient data in the key vaccination-proximate group. Unfortunately, their other comment that patients “near vaccination have more severe cogitative issues” may also be premature, until a better trial is completed.
Let me be clear, testing a p-value for many tests is mathematically equivalent to determining if a confidence interval (CI) includes zero. Just take the equation of the t-test, replace the t-value with a critical t and rearrange the values. You get a CI. Equivalent ≡ Identity. If you use a 5% error, this is the same as looking at the 95% CI. If it includes zero, then zero is a possibility. In that Lancet article, so was a value of +1% or +20% or -1% or -47%. That is why the CI is so far superior to a p-value. A p-value only examines one value (zero), while the CI examines the infinity of other credible value. So the result of the study could have been zero. It also could be -47% or +20%. The above quote “There were no differences in intellectual outcome …” makes the invalid assumption that one is only testing against zero. In truth, another point in the CI (mathematically equivalent to a p-value remember) was -47%. Unless one can say ‘a difference of 47% or less is clinically meaningless’, which no sane clinician would make, then one MUST conclude that ‘there may be huge differences in intellectual outcome’.
My overall comment? Do not publish these inadequate studies as science! If you want to ‘Prove the Null Hypothesis’, one actually needs to prove that the difference is less than a clinically important difference (e.g., π1 – π2 < 0.10). See my blog 5. Accepting the null hypothesis (http://allenfleishmanbiostatistics.com/Articles/2011/10/accepting-the-null-hypothesis/). Unfortunately, this requires a rather large N. For example, if a 10% difference is deemed clinically important and if one doesn’t know the true control group or experimental treatment group rates, then one would need a total of 822 patients (411 per group) to demonstrate that the difference is not clinically important. You used 12 patients? I laugh at this Lancet study.
My only recommendation to the Lancet editors is to demand a CI be presented, perhaps instead of p-values. If they had observed the very, very large CI, which included potentially huge differences, they would have quashed such opinion pieces masquerading as science.