In statistics, ‘error’ doesn’t always mean that you got it wrong. It certainly can mean that. Scientists can read instruments wrongly, we can make typos and so on. This is why detail-oriented people like me can be good scientists. We check and recheck, we shrug off the insults about being ‘sad’ and we push back those pesky frontiers of knowledge.
Another two ways that ‘error’ can mean that you got it wrong are the Type 1 and Type 2 Statistical Errors. The easiest way to remember which is which is to call the Type 1 error a false positive, or a case of crying wolf. I’ve seen it happen in science. If you want there to be a pattern, human nature tends to see a pattern. Or a wolf. A friend who may be reading this told me about Bayesian statistics, a method which (if I understand correctly) starts by believing in patterns. I want to learn more about it! But just now, I’m writing about the more conventional way of doing stats. We start by assuming that there’s no pattern. A Type 1 error means seeing a pattern that isn’t there.
A Type 2 error means failing to see a pattern that really is there. It’s why n, the number of observations, is key. We assume that the more we look, the more likely we are to see a pattern if there truly is one. Then P, the probability of what we observe having arisen by chance, gets smaller. When P is really small we believe that we’re not committing a Type 2 error. I’ve just noticed that n is lower-case whereas P is upper-case and I can’t say that I care why that’s so! What I do care about is that when a stats test shows P<0.05, that's a 95% probability of the association being true; when P<0.01 that's a 1% probability; and when P<0.001 we get really excited.
The reliance on P values contributes a lot to scientists' image problem, I think. Can't we just make our minds up and give an answer? Er, no. What we can do is to report the current state of our hypotheses.
Another source of confusion can arise around the phrase 'statistical error'. When the ancient Greeks invented perfect straight lines, perfect circles and so on, they laid the foundations of the Maths we rely on. It's fabulous but it's also not strictly true. Yes, the Earth is a sphere. But it's slightly flattened between its two poles. And if you walk across your land, even if you live in the Netherlands or on the Prairies, even allowing for the Earth's curvature you won't find a perfectly flat surface. You might find footprints, fallen leaves, wind-raised drifts, animal poo. That's statistical error, also known as noise.
Statistical error is shown on graphs by error bars. I want to illustrate this blog post with a bar chart, showing the mean values of a continuous variable in different categories and showing the standard deviation, the 95% confidence intervals, and maybe some other kinds of error bar. But my attempts to make such a graph about the dataset I showed you recently, about lambs' birthweight, have been thwarted. I must be missing something because I haven't found the method for making such a graph in Excel, SPSS or R.
[Edit] A few months later I found out how to do it in Excel. Will blog about this in due course.