Saturday, March 28, 2015

Two things to stop saying about null hypotheses


There is a currently fashionable way of describing Bayes factors that resonates with experimental psychologists. I hear it often, particularly as a way to describe a particular use of Bayes factors. For example, one might say, “I needed to prove the null, so I used a Bayes factor,” or “Bayes factors are great because with them, you can prove the null.” I understand the motivation behind this sort of language but please: stop saying one can “prove the null” with Bayes factors.

I also often hear other people say “but the null is never true.” I'd like to explain why we should avoid saying both of these things.


Null hypotheses are tired of your jibber jabber

Why you shouldn't say “prove the null”


Statistics is complicated. People often come up with colloquial ways of describing what a particular method is doing: for instance, one might say a significance tests give us “evidence against the null”; one might say that a “confidence interval tells us the 95% most plausible values”; or one might say that a Bayes factor helps us “prove the null.” Bayesians often are quick to correct misconceptions that people use to justify their use of classical or frequentist methods. It is just as important to correct misconceptions about Bayesian methods.

In order to understand why we shouldn't say “prove the null”, consider the following situation: You have a friend who claims that they can affect the moon with their mind. You, of course, think this is preposterous. Your friend looks up at the moon and says “See, I'm using my abilities right now!” You check the time.

You then decide to head to the local lunar seismologist, who has good records of subtle moon tremors. You ask her whether about what happened at the time your friend was looking at the moon, and she reports back to you that lunar activity at that time was stronger than it typically is 95% of the time (thus passes the bar for “statistical significance”).

Does this mean that there is evidence for your friend's assertion? The answer is “no.” Your friend made no statement about what one would expect from the seismic data. In fact, your friend's statement is completely unfalsifiable (as is the case with the typical “alternative” in a significance test, \(\mu\neq0\)).

But consider the following alternative statements your friend could have made: “I will destroy the moon with my mind”; “I will make very large tremors (with magnitude \(Y\))”; “I will make small tremors (with magnitude \(X\)).” How do we now regard your friend's claims in light of the what happened?
  • “I will destroy the moon with my mind” is clearly inconsistent with the data. You (the null) are supported by an infinite amount, because you have completely falsified his statement that he would destroy the moon (the alternative).
  • “I will make very large tremors (with magnitude \(Y\))” is also inconsistent with the data, but if we allow a range of uncertainty around his claim, may not be completely falsified. Thus you (the null) are supported, but not by as much in the first situation.
  • “I will make small tremors (with magnitude \(X\))” may support you (the null) or your friend (the alternative), depending on how the magnitude predicted and observed.
Here we can see that the support for the null depends on the alternative at hand. This is, of course, as it must be. Scientific evidence is relative. We can never “prove the null”: we can only “find evidence for a specified null hypothesis against a reasonable, well-specified alternative”. That's quite a mouthful, it's true, but “prove the null” creates misunderstandings about Bayesian statistics, and makes it appear that it is doing something it cannot do.

In a Bayesian setup, the null and alternative are both models and the relative evidence between them will change based on how we specify them. If we specify them in a reasonable manner, such that the null and alternative correspond to relevant theoretical viewpoints or encode information about the question at hand, the relative statistical evidence will be informative for our research ends. If we don't specify reasonable models, then the relative evidence between the models may be correct, but useless.

We never “prove the null” or “compute the probability of the null hypothesis”. We can only compare a null model to an alternative model, and determine the relative evidence.

[See also Gelman and Shalizi (2013) and Morey, Romeijn and Rouder (2013)]

Why you shouldn't say “the null is never true”


A common retort to tests including a point null (often called a 'null' hypothesis) is that “the null is never true.” This backed up by four sorts of “evidence”:
  • A quote from an authority: “Tukey or Cohen said so!” (Tukey was smart, but this is not an argument.)
  • Common knowledge / “experience”: “We all know the null is impossible.” (This was Tukey's “argument”)
  • Circular: “The area under a point in a density curve is 0.” (Of course if your model doesn't have a point null, the point null will be impossible.)
  • All models are “false” (even if this were true --- I think it is actually a category error --- it would equally apply to all alternatives as well)
The most attractive seems to be the second, but it should be noted that people almost never use techniques that allow finding evidence for null hypotheses. Under these conditions, how is one determining that the null is never true? If a null were ever true, we would not be able to accumulate evidence for it, so the second argument definitely has a hint of circularity as well.

When someone says “The null hypothesis is impossible/implausible/irrelevant”, what they are saying in reality is “I don't believe the null hypothesis can possibly be true.” This is a totally fine statement, as long as we recognize it for what it is: an a priori commitment. We should not pretend that it is anything else; I cannot see any way that one can find universal evidence for the statement “the null is impossible”.

If you find the null hypothesis implausible, that's OK. Others might not find it implausible. It is ultimately up to substantive experts to decide what hypotheses they want to consider in their data analysis, and not up to methodologists or statisticians to decide to tell experts what to think.

Any automatic behavior — either automatically rejecting all null hypothesis, or automatically testing null hypotheses — is bad. Hypothesis testing and estimation should be considered and deliberate. Luckily, Bayesian statistics allows both to be done in a principled, coherent manner, so informed choices can be made by the analyst and not by the restrictions of the method.

2 comments:

  1. Great post and I appreciate that "the null is never true" is not an appropriate statement. But researchers should be pushed to reflect whether the point null is actually an interesting thing to reject, and this is the spirit in which I think about the statement "the null is never true". For example, one could say in a priming experiment "Will you really have accomplished something if you reject the null, if the difference in population means is actually very tiny, such as 0.000001? Would such a difference really have consequences for psychological theory?" I can imagine someone usefully saying "The null is never true" in such a discussion, e.g. in a priming experiment there might be some subjects who consciously see the prime, ruminate on it, and then it affects their behavior. So here, contamination means the null is never true. But that way in which the null isn't true, leading to a tiny effect, isn't what the researcher was interested in.

    This is part of the broader problem that psychologists so often only attempt to reject the null rather than a more cumulative science of effect sizes and theories that actually predict them. But of course that's a great thing about Bayesianism! you have to specify an alternative hypothesis, which often leads one to concentrate on coming up with a predicted effect size.

    ReplyDelete
    Replies
    1. Hi Alex,

      I broadly agree with you (which is why I said that data analysis should be a deliberate process), but I'd like to point out that when you say "I can imagine...contamination means the null is never true" is itself a theoretical statement. It makes theoretical assumptions about how the system works that may be completely reasonable, or may be wrong. Biological systems have low- and high- pass filters; it is not the case that just because something has an effect on one part of a biological system, that it must have an effect on another. If we knew so much about a biological system that we could confidently say that the null is never true in a particular circumstance, we'd be far further along in our understanding of how things work (in psychology, in particular).

      It is important that we recognise that these sorts of arguments are scientific, and they must be *at least* testable, if not actually tested. And if they are to be testable, we must have the statistical machinery to test them. The best we can do is ask people to be deliberate. If there is not good reason to think the null is false and someone is doing parameter estimation, we should ask "Do you really know enough to assume that the null is false?"; if there is good reason to think the null is false, and they're doing point-null hypothesis testing, we should ask "Do you really want to assume that the null hypothesis can be true?"

      Delete