What’s wrong with null models?

A guest post by Carsten F. Dormann

Over the last years, I have been using null models more often than I liked. I had to, when there was no other way to figure out if an ecological pattern was unexpected, or trivial. Inspired by some recent (and also some older) posts, I thought I might throw around a few ideas that have been collecting dust in the back of my head, for what it’s worth.

Here is the summary of what I am going to say: on a philosophical level, nothing is wrong with null models. However, several things are suboptimal to the point of being almost as bad as wrong. Here’s my list, then I go through the points one by one, using examples from interaction network analyses :

1. What exactly does a given null model control for? Some relevant quotes are “Null models will always be contentious.” and “Keep everything constant apart from the mechanism of interest.” (Gotelli & Graves 1996)
2. Communicating null model reasoning is difficult: how is the result of a complex randomization algorithm to be interpreted ecologically?
3. Coding null models can easily lead to errors: it may look like a null model, but is it an unbiased null model?
4. Null models are only an in-between step: in the end, we really want a parametric model!

Preamble:

The idea of statistics, and null models in particular, is to investigate whether an observed pattern could have arisen by chance. In other words, we don’t want to be fooled into seeing patterns where there aren’t any (type I error).

But why do we need a null model in the first place? Why can we not interpret the observed pattern at face value, or use a standard statistical model? Let me try to explain with an example, loosely following the issue of Diamond’s (1975) birds-on-island story, later dissected ad infinitum (read the full story in Gotelli & Graves 1996). The observed pattern is that of a species-by-island matrix, with some species not co-occurring with others, and the question is if this pattern is random. Diamond calls it a checkerboard if on two islands two bird species occur only where the other does not (leading to a 01/10 pattern), and interprets this as a signal competition. Let’s say Diamond finds 22 checkerboards patterns for 10 birds on 20 islands. Is that lot (indicating strong imprints of competition on the co-occurence pattern), or to be expected given the bird abundances on these islands even without competitive effects?

The classical way to answer this question would be parametric statistics. In parametric statistics, we assume our data to be the result of a data-generating model, which consists of some systematic dependencies, combined with some random variates. If we cannot define or fit such a model, the null model comes in: the idea is to re-shuffle the data in some intelligent way to get an idea of what a random pattern would look like. Null modelling is thus a technique to get a feeling (a well-formalised feeling, that is) of what the data may look like without a systematic effect (usually hypothesised mechanism at work). The obvious place to look up definitions and examples is Gotelli & Graves (1996).

The key idea for our island bird problem (and many others) is that we must control for the fact that some species are rare and others are common. If a species is ubiquitous, it will have no “checkers”, and if it occurs on only one island, it can have maximally one “checker” (although with several species). Ecologically, prevalence may be related to generalism in feeding and nesting requirements, and/or dispersal abilities. The potential for checkerboards peaks at prevalences of half of the islands. So, one reason for a null model is to control for different prevalence of the species. (Gotelli et al. 2010 put predictors as weights into a null model; still this paper fails to convince me)

The same reasoning applies to the islands: some host many species, other only few. Again those with half the total number of species will have the highest potential for checkerboards, while islands hosting all or no species will have none. (Ecologically, this may suggest that some islands have a higher diversity of habitats, typically because they are larger.) So, another reason for a null model is to control for different habitat richness of islands.

We can implement this thinking by devising some kind of randomization mechanism that shuffles species identities around, preserving prevalences and island diversity, and from that deduce a null expectation for the checkerboard pattern, which we can compare with the observed checkerboard pattern.

However, this brings us straight to point 1.

Ad 1: What does a given null model actually control for?

All too often, I find it difficult to understand what ecological processes a null models controls for. We may know why we hypothesize a specific mechanism (say competition) to be behind a pattern. However, how can we be sure that a given randomization algorithm removes all but the effects of this mechanism? Actually, this is what the dispute about Diamond’s null models boils down to, and I don’t want to take sides, but as a first shot, as said above, I would be interested in randomising in such a way that each model retains the number of species it has, and that each species retains the number of islands it occupies. Why? Well, I would like to know how much potential for the observed pattern there is, given general constraints set by the data. If all randomisations that preserve prevalences and diversities lead to the same (observed) pattern, then clearly there is no additional competitive ingredient necessary to explain this pattern. (Note that this does not mean that species don’t interact. It “only” means that no interaction is necessary to yield this pattern.).

However, let me (and others) take issue with my own proposal: why should prevalence or habitat richness per island be considered constant? Is what we just set up really the right model to test for competition? I shall not attempt to answer this. It only serves to make my point: what do we actually control for?

Ad 2: Communicating null models.

My second point builds on the first: given that it is usually difficult to understand what is controlled in a null model, how do we communicate the results? The problem arises because ecologists use English, rather than mathematics, as language of communication. As a mathematical / algorithmic rule, a null model is perfectly well-defined. After having rejected such a null model, however, we have to translate this result into language and meaning.

I think it is already a major step forward if we recognise that these two issues need to be communicated. The next step could be to imagine the reader to disagree with our reasoning. He/She may think that we should allow all islands to be equal, without a priori difference (“neutral”). That would require a different null model. Well, so be it. One cannot anticipate all possible null models, but we can disclose the data and let people do whatever they want to do with it, preferably starting in the peer-review phase. In fact, the main criticism against Diamond that I have found really convincing is that he never released his original data.

So, next time you model nulls, give reason for the why and the how, and allow tests of robustness against critical assumptions.

Ad 3: Implementation errors.

The third point relates to the implementation, which is often non-trivial. I want to give an example from my own work.

For some project, we needed a null model for a matrix with pollinator visits to plants, which maintains marginal totals (i.e. the total number of visits per plant and per pollinator) and connectance (i.e. the number of nulls in the matrix). I followed the swap algorithm used for binary matrices, but modified it so that it also works for values other than 0 and 1. Without wanting to spend too much time on the details, it works like this: choose randomly two rows and columns. Add the minimum value of the counterdiagonal to the diagonal, and subtract it from the counterdiagonal (i.e. “move” the value). This creates a 0 on the counterdiagonal. Do this until the required number of 0s (connectance) is achieved. Sounds great – but it turns out that this is biased, in the sense that it does not keep constant some crucial properties of the data. The reason is that species with high abundance are less likely to have their numbers moved to the diagonal. This violates the idea that each cell should have the same probability of being subject to a “move”. I have not been able to correct this, so we resorted to using an additional null model (Vázquez 2005), which does not strictly maintain marginal totals. Luckily, the analyses we had done using the faulty null model (Dormann et al. 2009) were not qualitatively affected by the correction, but it taught me a lesson.

The point is: null models are highly specialized, so it’s likely that they will have to be hand-coded by an ecologist. Apart from the typographic errors, new code may be defective in a more subtle way as described above. I am ignorant of how to prevent such errors; the main reason is that we usually have no expectation of what a correct null model should deliver (otherwise I would not need it), so how can we test rigorously if a given code works correctly?

Ad 4: We want parametric models!

On some airport conversation a few years ago with Bob O’Hara, I was taken aback by his blunt statement that he doesn’t like null models. At that time, I was much in love with null models! Now, after a demonstration paper by Konstans Wells and Bob (2012), I finally understood his point – and agree. Null models are somewhat clumsy tools until we figured out a way to actually specify a parametric model.

In the Diamond story, we are really interested in whether specialists outcompete generalists, which make up by being better dispersers. So what we really want is an ecological model representing these processes, and then we want to fit it to the data , and while doing so correct for the effect of habitat diversity on differently sized islands, and for the traits of the species related to dispersal and and and . This dream model presents the actual ecology we’re interested in. Our data are likely to be too few, too noisy, too unspecific to fit such a model, but doesn’t that imply that also no null model will be able to address our question? And if there are enough data to inform the parameters in our dream model, doesn’t a highly constraint reshuffling of data in a null model seem an unnecessary circuitous way to the result?

Apart from the data problem, fitting complex stochastic models is also technically challenging. This point connects to another topic that has been discussed on this blog: Approximate Bayesian Computation (ABC). Fitting mechanistic models to (sets of) data can be a tedious little nightmare. But to my impression it is a much clearer and in the long run much less contentious way than null modelling.

Conclusions

Null models are here to stay for the immediate future, whether I like it or not. While this is the case, I guess the minimum standard would be to a) communicate the aim of the null model(s); b) communicate the idea of the algorithm of the null model; and c) provide data and code of the analysis. All this does not ensure we’ll be doing it correctly, but at least we err reproducibly.

References

Diamond, J.M. (1975) Assembly of species communities. Ecology and Evolution of Communities (eds M. Cody & J.M. Diamond), pp. 342–444. Belknap Press, Harvard, MA.

Dormann, C.F., Blüthgen, N., Fründ, J. and Gruber, B. (2009) Indices, graphs and null models: Analyzing bipartite ecological networks. The Open Ecology Journal, 2, 7–24.

Gotelli, N.J. and Graves, G.R. (1996) Null Models in Ecology. Smithsonian Institution Press, Washington D.C. [available at the first author’s homepage for free, as book is out of print.]

Gotelli, N.J., Graves, G.R. and Rahbek, C. (2010) Macroecological signals of species interactions in the Danish avifauna. Proceedings of the National Academy of Sciences of the USA, 107, 5030–5.

Vázquez, D.P. (2005) Degree distribution in plant–animal mutualistic networks: forbidden links or random interactions? Oikos, 108, 421–426.

Wells K, O’Hara RB (2014) Species interactions: estimating per-individual interaction strength and covariates before simplifying data into per-species ecological networks. Methods Ecol Evol 4:1–8. doi: 10.1111/j.2041-210x.2012.00249.x

8 thoughts on “What’s wrong with null models?”

1. When we calculate a specific statistical metric (let’s say t) for a randomized dataset pooled over two groups of data, and then we compare the values obtained from many randomizations with the t value for the original data, in the end, we are using the same rationale of a null model. When we mix the two groups and sample from that, is like assuming that the data comes from the same distribution of possible values, the assumption used to build the distribution of t. Therefore, I guess that any metric calculated from resampling, in the end, needs the communication of the same standards that you propose. I aways saw null models as a middle ground between statistical models (where we compare our metric to an expected value based on an assumption) and parameterized models. But when conducting inference through resampling, my feeling is that I am performing a null model.

Like

• Rafael, with your example, you mean that we have samples for two groups, and randomize the group labels? Yes, I would also call this a null model.

I suspect Carsten’s perspective comes from the more complex community-ecology null models where various things are randomized, but I guess it’s a good point that not every randomization is problematic (at least in the example you mention, I don’t see any major problems for either interpretation or implementation).

Like

• Hi Florian, Carsten perspective is much more complex ideed. I aways strugled to choose the best binary null model for my simulation (swap, r0, etc..) when using oecosimu in R. I think his perspective points out to the basic information you have to know to perform such choice.

Like

2. I agree with the sentiment about parametric models, but what does that mean practically? Probably we are speaking not only about parametric, but also about mechanistic models for which equilibria can only be obtained by simulation, right? Because, if we would have the option to construct a statistical model for which equilibrium and likelihood are tractable, we probably wouldn’t have used a null model in the first place.

We review the efforts to develop mechanistic models for the community / metacommunity / biogeographic context in Cabral et al., Ecography, 2017. I’m generally much in favor of this line of research and think ecology will learn a lot by formulating explicit mechanistic hypotheses and simulating them through. That being said, however, one has to admit that there are still major computational challenges with inferring even a limited number of parameters from simulation in such a context. Even if we throw vastly larger computational efforts on the problem than a typical null model would require, we will probably still have to make big compromises regarding the model complexity, which creates a bit of a similar problem as your problem #1: we can infer the parameters, but we know the the model is wrong (= incomplete), so what do they mean? Can we trust a parameter of niche strength if the dispersal is wrong?

But I don’t want to be to negative – on the upside, at least we have a parameter that we can compare, meta-analyze and so on. Because that’s a flaw that null models share with all hypotheses tests: a yes / maybe is much less informative than a number, and probably as much controlled by the size of the dataset as it is by the underlying strength of the ecological process.

Like

• I think there is not so much a fundamental difference between using null models and fitting parametric models, so much as a matter of degree. In some sense, testing null models or computing parameter estimates are both ultimately constrained by the assumptions of the model and how well it approximates the data/reality. In a kind of meta-model sense, simulating a null model that excludes a mechanism is a bit like estimating a binary hyper-parameter for the inclusion of that mechanism. In the end, I think it can be useful to think of both null models and fitting parametric models as within a model selection framework.

Like

• Well, yes and no, depends on your definition of “fundamental”. For me, there are two main difference:

a) In general, a null model performs a significance test. And I would say that, although one can use NHST / MS / Parameter for asking whether a process is there, they are still asking this question in a slightly different way.

b) The second point is about the data-generating model. The randomization null models that are used in community analysis don’t really have an explicit data-generating model. Do we care? No idea, but there is a difference I feel to the parametric model that have a data-generating model.

Like