I don't think that the 'null hypothesis of no effect' and 'chance being the sole factor at play' are equivalent concepts.
PREMISE
I am not a professional statistician, so I would like this blog to be understood as a space for sharing my reflections rather than a collection of "lessons" (which is why I named it "Statistical Thoughts" rather than "Statistical Certainties"). Of course, here I try to express my opinion in a convincing - but hopefully honest - way!
ARGUMENTATION
Here I critique (I welcome any potential rebuttals) the concept of 'null hypothesis of no effect or association' as an equivalent version of 'chance is the only factor at play.' Indeed, within a frequentist-inferential statistical model, chance is always the only factor at play! A P-value is a number calculated based on other numbers (the observed test statistic and degrees of freedom) that have no memory of how they were generated nor are "aware" of the practical meaning we assign to the numbers we put in the corresponding formulas (e.g., average values, standard deviations, sample sizes, etc.). Therefore, assuming a priori that chance is the only factor at play, all the P-value tells us is the average frequency with which we will obtain as or more extreme numbers (future 'observed test statistics') than the one provided (the current 'observed test statistic').
Therefore, it is up to the user to interpret the whole scenario in light of their objectives. For example, it is we (and not a model for us) who determine that if a value we call 'hazard-ratio' (HR) is 0.5, this implies a high effectiveness of a treatment. Accordingly, we define the tested hypothesis 'HR=0.5' as the hypothesis of treatment effectiveness. Such a choice must stem from a careful examination of the actual costs, risks, and benefits (e.g., does the beneficial effect justify the adverse events?).
Thus, the P-value is calculated assuming the errors will distribute randomly (according to a predetermined distribution) around the parameter specified in the target hypothesis. But, I repeat (based on my understanding), the test "knows" nothing about the complexities related to the existence or absence of the investigated phenomenon: It always "sees" chance as the sole phenomenon at play. Similar considerations can be made for the concept of 'coverage probability' concerning confidence intervals.
Consequently, all frequentist-inferential models that include numerical hypotheses always unconditionally assume that chance is the only factor at play, i.e., the only determining factor for calculating the P-value once the observed test statistic and the degrees of freedom (which depend uniquely on sample sizes in many common tests) are provided.
PRACTICAL REASONS
The P-value is often confused with the probability that the target hypothesis (typically the null hypothesis of no effect or association) is true. This misunderstanding is incorrect on two levels: first, statistically, because the P-value is not a posterior probability—it is calculated under the assumption that the tested hypothesis is true; second, epistemically, because the hypothesis under consideration is not the existence or non-existence of a phenomenon but rather the equality between a variable called 'parameter' and a specific number.
Common frequentist models typically assume that this number is immutable (indeed, probability - including the P-value - is understood as the parameter 'event frequency' of a population of infinite equivalent experiments); however, in fields like medicine or epidemiology, the actual effect may vary substantially due to unidentified causal factors. For instance, the efficacy of a drug after COVID-19 might be very different from that before the pandemic due to substantial variations in underlying conditions (e.g., changes in the immune system due to the disease and mass vaccinations, different health behaviors such as the increased use of sanitizers and masks, psycho-physical effects of social distancing, etc.). Randomization of patients into treatment and placebo groups does not protect against such variation issues.
For these reasons, in addition to being logically and epistemically correct, keeping the mathematical-statistical framework separate from the real-world context can help understand i) the limitations of the frequentist approach, and ii) the crucial role of humans in choosing the most appropriate mathematical model to accurately represent the scientific experimental situation.
References
Rovetta A, Mansournia MA, Vitale A. For a proper use of frequentist inferential statistics in public health. Glob Epidemiol. 2024 Jun 15;8:100151. doi: 10.1016/j.gloepi.2024.100151. PMID: 39021384; PMCID: PMC11252774.
Rovetta A. Compatibility ranges as a practical alternative to the “significant/non-significant” statistical dichotomy. Public Health Toxicol. 2024;4(2):6. doi:10.18332/pht/189530.
Comments
Post a Comment