What is it about?

In a new paper in the European Journal for Philosophy of Science, I consider Fisher's criticism that the Neyman-Pearson approach to hypothesis testing relies on the assumption of “repeated sampling from the same population.”

Featured Image

Why is it important?

This criticism is problematic for the Neyman-Pearson approach because it implies that test users need to know, for sure, what counts as the same or equivalent population as their current population. If they don't know what counts as the same or equivalent population, then they can't specify a procedure that would be able to repeatedly sample from this population, rather than from other non-equivalent populations, and without this specification Neyman-Pearson long run error rates become meaningless.

Perspectives

I argue that, by definition, researchers do not know for sure what are the relevant and irrelevant features of their current populations. For example, in a psychology study, is the population “1st year undergraduate psychology students” or, more narrowly, “Australian 1st year undergraduate psychology students” or, more broadly, “psychology undergraduate students” or, even more broadly, “young people,” etc.? Researchers can make educated guesses about the relevant and irrelevant aspects of their population. However, they must concede that those guesses may be wrong. Consequently, if a researcher imagines a long run of repeated sampling, then they must imagine that they would make incorrect decisions about their null hypothesis due to not only Type I errors and Type II errors, but also Type III errors - errors caused by accidentally sampling from populations that are substantively different to their underspecified alternative and null populations. As Dennis et al. (2019) recently explained, "the 'Type 3' error of basing inferences on an inadequate model family is widely acknowledged to be a serious (if not fatal) scientific drawback of the Neyman-Pearson framework." To be clear, the Neyman-Pearson approach does consider Type III errors. However, it considers them outside of each long run of repeated sampling. It does not allow Type III errors to occur inside a long run of repeated sampling, where the sampling must always be from a correctly specified family of "admissible" populations (Neyman, 1977, p. 106; Neyman & Pearson, 1933, p. 294). In my paper, I argue that researchers are unable to imagine a long run of repeated sampling from the same or equivalent populations as their current population because they are unclear about the relevant and irrelevant characteristics of their current population. Consequently, they are unable to rule out Type III errors within their imagined long run. I conclude that neither Neyman nor Pearson adequately rebutted Fisher’s “repeated sampling” criticism. I then briefly outline Fisher’s own significance testing approach and consider how it avoids his "repeated sampling" criticism.

Prof Mark Rubin
Durham University

Read the Original

This page is a summary of: “Repeated sampling from the same population?” A critique of Neyman and Pearson’s responses to Fisher, European Journal for Philosophy of Science, September 2020, Springer Science + Business Media,
DOI: 10.1007/s13194-020-00309-6.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page