A comment by Deborah Mayo on a recent post by Andre Gelman started a heated and surprisingly subtle discussion on the correct (frequentist) definition of the p-value that currently continues on Larry Wasserman blog. Read for yourself – I for myself am still happy to write p(D>d|H0) ;).

The discussion on Gelman’s blog is certainly lively. Your definition p(D>d|H0) is correct.

To be precise – I do understand why one would want to write p(D>d ; H0), or p_{H0} (D>d), where the latter is preferable imo, but I don’t see any harm arising from writing p(D>d|H0) either, in fact, I find it more intuitive. The different notation seems to me more a statement about how one likes to think about the null hypothesis than an inherent necessity of frequentist analysis; you can decide that you don’t want to think about the decision H0, !H0 as a probability distribution, you can then calculate p_{H0} (D>d), and conclude from the p value that !H0 must be the case, maintaining all the time that there was never a probability on H0, !H0, but it seems to me that you will arrive at exactly the same conclusions when thinking p(D>d|H0) in all practical situations. Whether H0 is measure-zero seems to me more a philosophical problem of frequentism than a problem for this notation.

Interesting! Glad that you relayed this Florian… I am now a convinced p_{H0} (D>d) 🙂 It might be, as you say, that one can reach exactly the same conclusions from frequentist and Bayesian p-values; yet given the degree of confusion with p-values taught in the context of classic frequentist theory, it seems worthwhile to be precise.

Actually the problem with notation is similar to wording issues. Wikipedia defines it as “the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true”.

I think this is much better than stating “conditional to the null hypothesis”, as is often heard (and written by Gelman)…

My impression is the disagreement stems largely from the difference between the mathematical and conventional meaning of “conditional”. If conditional is used with its day to day meaning, there’s of course no problem, but then it becomes wrong when the mathematical definition of conditional is used…Given the bar is used to define mathematical conditioning, I’d say there a strong potential for misinterpretation. I’ll try to keep it in mind for teaching.

I guess that’s the classic problem with words that have a different meaning in scientific disciplines (especially within mathematical models) and common language, which we must define very precisely. Other contenders if you ever feel like it: “neutral” in population genetics or community ecology, “altruism” in behavioural ecology, “rationality” in economics…

Hi Fred, hehe, seems this is the perfect topic for starting a fight in the pub after a stats conference 🙂

I agree that we should be precise, the thing is that i feel that “assuming H0” actually leaves space for both interpretations.

If you want to define “the probability of D>d in a world that happens to be H0”, then I agree, you should write p_{H0} (D>d)!

However, maybe I’m a spoiled Bayesian, but I confess I do think that H0, !H0 must have probabilities underlying them even if we’re not directly calculating these probabilities in a NHST framework (I feel we implicitly discuss them when we look at Type I/II errors though), and thus I would tend to write p(D>d|H0), and really mean conditional.

As I said, so far I wasn’t persuaded by any of the arguments that this creates a problem, it seems to me that p(D>d|H0) will produce exactly the same values as p_{H0} (D>d), so I feel the concept of the p-value simply has a degree of freedom here that everyone can fill with his own inferential philosophy.

Glad to be persuaded though by any good argument against that. So far the only problem I see is that students could be confused by the different notations, but I doubt that this is a practical problem in teaching.

If what you say is true – not saying it isn’t – then p-values computed the Bayesian way (tail probability of the statistics conditioned on some parameter theta=theta_0) should be equal to those obtained by frequentist-based computations. Irrespective of priors on theta.

Any idea how often this is true?

I can’t think of occasion where they would not give the same value, so I would say p_{H0} (D>d) = p(D>d|H0), the equal sign here meaning that they will give you the same value, not that they are identical.

However, p(D>d|H0) suggests that H0 is one possible value from a probability distribution, while the other notation doesn’t, which seems what all the arguments circle around.

OK, my bad, I read a little more and this became (a bit) clearer. I got confused with arguments on posterior predictive p-values, but these are different from what the “Bayesian” p-values we discuss.

Sure, if theta ~ L(theta_0), then the rules on conditional probability ensure P(D>d|theta=theta_0) is well defined whenever the law L is discrete or absolutely continuous.

I wonder if the supremum used in the frequentist definition is there to deal with weird sets for H_0, leading to cases where we cannot write down a conditional probability, or something else altogether…

Hi Fred, I wondered about the latter as well, but all in all, it seems more a philosophical question to me.