Statistics – a specialty of mathematics whose basic tenant is ‘Exceptions prove the rule’.
(Almost) from The Graduate
It’s always darkest before the dawn. I see the light! I see a light at the end of the tunnel. Hey, I’m not in a tunnel. Whoops, I didn’t see a light. I just thought I did.
I had completed my BS, entered graduate school, and was assigned a teaching assistant-ship for the lab section of an introductory statistics class. The professor of the class asked me, in private, what the most important statistic was. I said the mean, he sagely shook his head no, the correct answer for understanding statistics is the variance. He was right, oh so right.
First let me show you the standard formula for the variance.
s2 = Σ (Xi – M)2/(N – 1),
where Xi is some individual’s data, M is the mean of all the data, and N is the number of observations.
If you squint your eyes, this looks quite close to a simple average, Σ Xi/N. Except that the denominator is not N but N – 1. [Actually some statisticians recommend using N, as it is a maximum likelihood estimator of the variance. I’ve even heard a great argument for using N+1 (a minimum mean square estimator of the variance). But N-1 gives us something called an unbiased estimator, and N-1 is so traditional that it is almost always used as a denominator.] The ‘-1’ is used because the variance uses one parameter, the mean, to estimate itself.
The important thing with the variance is that it’s a type of average, the average of the differences from the mean (Xi – M). Now if we took everyone’s score, subtracted the mean and averaged that, the sum would have to be zero. All the negative and positive changes from the mean would cancel out. What we could do is ignore the sign and average that. This is called the absolute deviation. Unfortunately (or fortunately) statisticians prefer to square things, it has some very, very useful properties, especially for normally distributed data, which we often see. So squaring is something we often do in statistics. Be forewarned, this will be on the test.
One problem with the variance is the units. If you are measuring in inches, the term inside the numerator’s parentheses is inches, when you square it it becomes inches squared. Not useful, so we take the square root to get back to the original unit (e.g,, inches again). This, of course, is the standard deviation. The standard deviation (often abbreviated sd or s.d.) simply is the average difference from the mean.
Returning to the variance, we can see that the variance is a measure of how people differ from the average. Let’s consider this. If you were asked the height of people who visited your favorite drug store, you wouldn’t guess 3 inches, nor 8 feet, unless you were being silly. You’d probably use a number like the mean height; M in the above equation. Just how good is your guess? Well, that’s exactly what the standard deviation is telling you. It’s the average of how much you were off (Xi – M), the average exception to your rule. Let me give another way to calculate the numerator, I’ll spare you the algebraic proof, if you took every person in your sample and took the difference between them and every other person, then divide by the proper ‘fudge-factor’, you’d get identical results. That is, you took (X1 – X2)2, (X1 – X3)2, … and (XN-1 – XN)2 and divided it by the appropriate N. The variance/standard deviation is a measure of how much people (the scores) differ from every one else. We’ll return to this alternative viewpoint later when discussing the analysis of variance.
By the first viewpoint, the standard deviation is simply a measure of the average error in using the mean as a way to summarize all people’s scores. It can be thought of as a measure of noise.
I’ll ignore the equation of a t-test. I put you through enough already with the variance. What is a t-test? A t-test is simply a ratio of a signal, typically the difference between means, and ‘noise’. Another way to consider the signal is to think of it as the amount we know following a model, a ‘rule’. If the signal is meaningfully larger than the noise, we say something might be there. Imagine yourself in a totally dark room (or tunnel) and someone may or may not have turned on a very weak light. Did you see this brief flash or did you imagine it? Was there a signal (light) or was it just eye-noise. This is a signal to noise ratio.
What is meaningfully larger? For a two-tailed t-test with a 0.05 alpha level, any ratio larger then about 2. A value of 2 makes intuitive sense to me. If the ratio of signal to noise was around 1, then the signal isn’t really larger than the average noise level. So with a ratio less than 1, how could you be really sure it’s a signal and not noise. Well you can’t be sure! Mathematicians have actually worked things out and if the ratio of signal to noise (i.e., the t-test) was between +1 and -1 (and we’re using a normal distribution), then ratios of that size would be seen approximately two-thirds of the time. (It’s actually 68.27%, but why quibble.)
One thing I glossed over was ‘noise’. It is very closely related to the standard deviation, but standard deviation is how well we can guess an individual’s score is. With the typical t-test, we’re looking at differences in means. A mean is a more stable estimate than a single person score would be. How much better is usually a function of the number of observations one has. A mean of a million observations would likely be dead on. A mean of two observations wouldn’t be expected to be very accurate. I’ll also gloss over how to get it (it’s typically a function of the square root of N), but the accuracy of the mean is measured by something called the standard error of the mean. You can’t say that we statisticians are very creative in naming things. At least it makes it easy for us to remember. Not like biologists who name things like Ulna or Saccule.
In any case, the t-test is simply the ratio of the difference between means (the signal) and the standard error of (the difference between two) means (the noise). This has a well known distribution – what you’d expect to see. Again, we uncreative statisticians called it a t-distribution.
The ratio of the signal, or amount explained by the model or ‘rule’, divided by the noise, or amount unexplained or exceptions, is the basic method to validate a model. Hence my quip at the beginning of this blog: Statistics – a specialty of mathematics whose basic tenant is ‘Exceptions prove the rule’.
I’ve been discussing the t-test in terms of means. Yes we can compare two observed means. We can compare a treatment mean with a hypothetical mean (e.g., is the difference equal to 0.0 or 1.0). We can do both (e.g., the difference between the means is equal to 2). We could also replace the mean with other things, like correlations.
Test time: How would a statistician transform the t-test?
Please take out your blue books and fill in the answer.
Put your pencils down: we’d square it. Don’t say I didn’t warn you that we love to square things. If we squared the t-test, we’d get the
Analysis of Variance (ANOVA):
The numerator of the t-test for two means is M1 – M2, with the subscript indicating the two means (e.g., active and control). So what do we do if we have more than two groups? Well, we’d like to take the pairwise difference between each mean with every other mean. Yes, we saw something like that before – the alternative viewpoint of the variance. Going back to the variance we could do something almost identical: Σ (Mi – M.)2/(Ng – 1). Instead of each individual’s score, we use each treatment group’s mean, Mi. M. is the mean of all the means; some people call it the ‘grand’ mean. Ng – 1 is like N – 1, but with Ng as the number of means, the number of treatment groups. This is the numerator for the analysis of variance, how the means differ from one another, the signal.
The denominator is still the ‘noise’. In this case, within each group our best guess is that group’s mean, so we look at the errors in using that mean within each group. For example, in group 1 we would compute the squared deviations from each of group 1’s scores from the group 1 mean. We then do the same for each of the other groups and add them all up. Finally we divide that by something like N-1, actually N – Ng. ‘- Ng‘ because like the ‘-1’ we are using Ng means to compute the errors or noise. Finally, like the t-test we divide the signal by the noise and come up with a ratio, the F test.
As I stated before, the t-test squared with its squared t-distribution is identical to an ANOVA’s F-test with its distribution when we’re comparing two groups. However, the ANOVA can also handle more than two groups. In that lies its power and adaptability, as well as its weakness. More about that in the next blog.