Why S-values (surprisals) Instead of P-values?

June 18, 2024

The other day, a colleague asked me an important question. When the hypothesis under consideration is not the null hypothesis of no effect (e.g., the difference between two mean values is 50 and not 0), a P-value equal to 1 indicates that—assuming chance is the only factor at play—we would obtain a "result" greater than or equal to the observed one 100% of the time in numerous future equivalent experiments.

So, how can chance - assuming it operates alone - always generate a greater "result" than the observed experimental one?

The apparent problem is easily solved by noting that the so-called "result" (or, better, the "statistical result") does not refer to the magnitude of the effect under examination but to the value of the test statistic. If we fix the hypothesis "difference = Δ = 50" and observe a difference of 50, then the test statistic (i.e., the statistical result) turns out to be 0! And what is the probability of obtaining a test statistic as or more extreme than 0 assuming chance is operating alone? Obviously, 100% (every real number satisfies such a request). That's the meaning of p = 1.

Surprisal, I love u <3

If we adopt surprisals, this problem does not arise: Indeed, the S-value represents the number s of consecutive heads we would need to obtain—by fairly tossing a coin s times—to match the statistical information entailed by the result. In this case, the researcher is perfectly aware that the coin toss is just a point of comparison and has nothing to do with the investigated scientific phenomenon. Specifically, the S-value tells us how surprised we should feel when observing an experimental outcome compared to the prediction of the hypothesis under consideration. For instance, s = 4 implies that we should be as surprised as when we get 4 consecutive heads in 4 fair coin tosses. In the case of p = 1, the corresponding s = 0 (since p = (1/2)^s, i.e., s = -log_2 p) tells us that we should not feel surprised at the observed result compared to the fixed hypothesis (as they are in perfect agreement).

No, the S-value is not a hippie-naive version of the P-value

P-values present numerous interpretive difficulties beyond the one described above. For example, two pairs of probabilities (90%, 95%) and (5%, 10%) might seem equally "distant" from each other since 95-90=10-5=5. Yet, the first two events are almost equally probable (in fact, 0.95/0.90 = 1.1) while the second two are not (0.10/0.05 = 2, i.e., one is twice as likely as the other). Nonetheless, even the probability ratio has its problems. For instance, an event with a probability of 0.04 (4%) has an expected frequency 4 times higher than an event with a probability of 0.01 (1%); However, when compared to the number of consecutive heads when fairly flipping a coin, the difference in "rarity" between these two is equal to the rarity of the event "getting 2 consecutive heads when fairly flipping a coin 2 times" (since 1/2·1/2 = 1/4 or, equivalently, log_2 (0.04/0.01) = 2). In other words, the difference between the two events under consideration is generally not substantial in probabilistic terms. Therefore, S-values describe statistical information in a more balanced way than P-values. As a 'side' note, remember that all this is conditional on the validity of the statistical model (which needs to be checked)!

And what is information?

Sander Greenland, Valentin Amrhein, and Blake McShane [thank you for your patience and your help] have helped me a lot in understanding the essential importance of this concept in statistics. However, although (I believe and hope) everything is clear to me from an operational standpoint, the definition of 'information' still eludes me. In this regard, I believe there is no universally valid way to quantify information as this is intrinsically tied to our perception (decoding of content). In this specific context, we could say that the fact that statistical information is 'log-scaled' is the mathematical way of saying that the communicability of its content generally improves due to this transformation (although there are also solid mathematical reasons). For some practical examples of why this is the case, you can refer to our very recent article, section "A less innovative but still useful approach" [again, thank you Sander, thank you Valentin, and thank you Blake] [1].

References

1. Rovetta, A., Mansournia, M. A., & Vitale, A. (2024). For a proper use of frequentist inferential statistics in public health. Global Epidemiology, 100151. https://doi.org/10.1016/j.gloepi.2024.100151

Search This Blog

Statistical Thoughts