As a result of this insight, a large number of statistical approaches exist where we try to optimise an objective of the form:

Quality(M) = L(M) – complexityPenalty(M)

where M is the model, L(M) is the likelihood, and complexityPenalty(M) adds some penalty for the model’s complexity. Examples for this structure are information criteria such as the AIC / BIC, shrinkage estimations such as lasso / ridge (L1 / L2) penalty, or the wiggliness penalty in gams.

When these techniques are introduced in stats classes, they are usually motivated as a means to reduce overfitting, based on the arguments that I gave above. It is well-known (however, possibly less widely) that many of these penalties can be reinterpreted as a Bayesian prior. For example, shrinkage penalties such as the lasso (L1) or the ridge (L2) are equivalent to a double exponential respectively normal prior on the regression parameters (see Fig1). Likewise, wiggliness penalties in gams can be reinterpreted as priors on functional simplicity (see Miller, David L. (2019)).

One may therefore be tempted to re-interpret complexity penalties from statistical learning such as L1/L2 as an a-priori preference for simplicity, similar to Occam’s razor. This, however, misses an important point: in statistical learning, the strength of the penalty is usually estimated from data. L1/L2 complexity penalties, for example, are usually optimised via cross-validation. Thus, the simplicity preference in these statistical learning methods is not really a priori (what you would expect if we had a fundamental / scientific, data-independent preference for simplicity), but it is something that is adjusted adjusted from the data to optimise the bias-variance trade-off. Note also that, in low-data situations, the penalty may easily favour models that are far simpler than the truth.

This is the reason why classical L1/L2 regularisations are better interpreted as “empirical Bayesian” rather than fully Bayesian. Empirical Bayesian methods are methods that use the Bayesian framework for inference, but with priors that are estimated from data. Empirical and fully Bayesian perspectives can be switched or mixed though. One could, for example, add additional data-independent priors on simplicity in a model, and in some sense the common Bayesian practice of using “weakly informative” (data-independent) priors on regression parameters could be interpreted as a light fundamental preference of Bayesian for simplicity.

How does that help us in practice? Well, for example, I am a big fan of shrinkage estimators and would nearly always prefer them over variable selection. The reason why they are rarely used in ecology, however, is that frequentist regression packages that use shrinkage (such as glmnet) don’t calculate p-values. The reason is that obtaining calibrated p-values or CIs with nominal coverage for shrinkage estimators is hard, showing that the latter are probably better understood as a statistical learning method that optimises predictive error than a frequentist method that has controlled error rates. If we re-interpret the shrinkage estimator as a prior in a Bayesian analysis, however, we naturally get normal posterior estimates that can be interpreted pretty straightforward for inference. Thus, if you want to apply L1 / L2 penalties in a regression without loosing the ability to discuss the statistical evidence for an effect, just do it Bayesian!

References

Miller, David L. (2019) “Bayesian views of generalized additive modelling.” *arXiv preprint arXiv:1902.01330* .

Polson, N. G., & Sokolov, V. (2019). Bayesian regularization: From Tikhonov to horseshoe. *Wiley Interdisciplinary Reviews: Computational Statistics*, *11*(4), e1463.

Park, T., & Casella, G. (2008). The bayesian lasso. *Journal of the American Statistical Association*, *103*(482), 681-686

What is the cause and what are the processes that gave rise to Earth’s biodiversity patterns through space and time? Much research has been devoted to describing these patterns, and over the years, the fields of macroecology and macroevolution have slowly transitioned from a mainly correlational to a more mechanistic perspective _{(1, 2)}. The challenge with understanding the mechanisms of macroevolution is that, while evolution has in principle simple general rules, it operates across a complex dynamic world. As a result, there is only so much we can understand with simple theoretical and empirical models – for a more detailed understanding of the diversification of life on earth, we will require models that reflects not only ecological and evolutionary processes, but also the complexity in spatio-temporal drivers of the system, in particular changes in climatic and geographic patterns over evolutionary time scales. Such flexible eco-evolutionary models that use realistic dynamic landscapes allow to realistically compare candidate processes leading to the emergence of biodiversity patterns (such as past and present α, β, and γ diversity, species ranges, ecological traits, and phylogenies) against empirical evidence.

In this post, I share the story of the development of *gen3sis* _{(1)}, an exciting new simulation engine that hopefully will bring us closer to uncover some of the mysteries behind Earth’s biodiversity. *gen3s*is stands in the tradition of scientists moving from simple mathematical to more complex computational models _{(3)}, and my academic development followed a somewhat similar path. Around 2013 I developed a generalized phylogenetic tree simulator (TreeSimGM) based on multiple probability density functions for speciation and extinction (Bellman Harris Model) together with T. Stadler _{(4)}. We found that age-dependent speciation best explained empirical topologies (tree shape balance) _{(5)}. However, linking such abstract probability functions to real processes is difficult and limited to hypothesis formulation and further speculation. In 2016 I dug deeper into the biological mechanisms underlying biodiversity dynamics by adding more detailed ecological processes to an existing spatially explicit macroevolutionary model (SPLIT) written by P. Descombes, T. Gaboriau, F. Leprieur, L. Pellissier and others _{(6, 7)}. This allowed us to investigate the emergence of global biodiversity patterns.

Informed by my previous experiences on generalising a birth-death model inside a new context, I wanted to build a more modular and flexible simulation engine which became *gen3sis*: *general engine for eco-evolutionary simulations* _{(1)}. The idea was to overcome the limitations of simple models that do not consider explicit spatial-temporal changes or spatial models that are built around fixed assumptions and ignore or limit experimentation of ecological, evolutionary as well as complex interactions. By allowing for custom ecological and evolutionary process and interactions in an explicit dynamic landscape, we can better predict and understand diversification under changing conditions and expose complex processes at multiple temporal and spatial scales.

During this time, *gen3sis*’ architecture changed multiple times and its development involved multiple interdisciplinary contributions, including the dialog between software engineers, geologists, modelers, and empiricists. For example, Benjamin Flück, a software engineer, joined the team and helped optimizing code (e.g. R to C++) and passing selected functions into a customizable configuration file. This relaxation of eco-evolutionary rules input over a configuration file demanded further thoughts on functions and parameters naming, for proper mechanisms categorization and intuitive model use. Important to this naming and process definition process was the involvement of the Landscape Ecology and an sDiv synthesis group with participants from multiple backgrounds and specific ecological or evolutionary perspectives. Finding a balance between speed, generality and usability was a long trial and error process.

The result is a modelling engine that for the first time offers the ability to simulate almost any scenario for extraordinary insights to life on earth from deep-time to large spatial scales. *Gen3sis *keep track of differentiation between populations, allowing for distance decrease after secondary contact, while permitting multiple traits that can evolve and interact with biotic and abiotic components linking ecological and evolutionary processes. Non changeable and central to the model are the calculations of clusters of connected populations, which are based on universal principles of geneflow between populations in a spatial context and dependent on dispersal abilities. Initial conditions as well as other modelled processes including speciation, dispersal, trait evolution and ecology are changeable and interconnected in a very customizable and intuitive way.

For example, take speciation, which is essential to understanding the emergence of biodiversity. In most phylogenetic macro-evolutionary simulators, speciation happens according to a probability density function in a space-less fashion. In *gen3sis*, new species results from a set of rules (functions informed by a user defined configuration file and speciation happens in allopatry, after populations are spatially isolated for a certain period of time. This isolation can depend on: (1) species dispersal abilities which can evolve and tradeoff with other traits; (2) landscape connectivity which can consider barriers (e.g. land for aquatic or water for terrestrial organism) and change over time (e.g. a reconstructed paleolandscape); (3) ecological processes which can modulate abundances or presences considering abiotic and biotic conditions as well as (4) evolutionary processes that dictate persistence under changing conditions or adaptation to new settings. Additional mechanisms and feedbacks are possible, such as the inclusion of temperature effects on mutation, or metabolic rates. Consequently, model complexity is customizable, allowing us to test and see if we can differentiate between models.

*Gen3sis *is more than just developing an eco-evolutionary model to answer one specific question. *Gen3sis *is a general engine allows the formalization and testing of ecological and evolutionary processes happening in complex and dynamic landscapes. *Gen3sis*’ flexibility opens up a wide range of future applications, demonstrated in a case study accompanying the methods publication on PLOS Biology addressing the latitudinal diversity gradient _{(1)}. On another – soon to be published – study, *gen3sis *revealed the importance of palaeoenvironmental dynamics, rather than current climatic factors, on the formation of uneven distribution of biodiversity across tropical regions. Currently, I am using *gen3sis *to study local processes and better scale mechanism in space, time and levels of complexity using regional metacommunity eco-evolutionary experiments.

Exciting other possible future applications could address causal links between biodiversity and: (a) orogenetic and/or erosion models; (b) aquatic ecological and/or evolutionary processes; (c) temperature and/or water availability; (d) climatic variations; (e) intraspecific genetic variability; (f) functional traits such as niche width and dispersal abilities as well as (g) emerging interaction networks. Practical use could involve long term conservation planning, such as wildlife corridors, or modeling the spreading of infectious diseases under multiple scenarios (e.g. COVID). Alternatively and personally very interesting for me is that *gen3sis *can contribute to fields that are traditionally not relying on biological principles, such as cultural and technological evolution. For more nonexhaustive expected applications of *gen3sis *see Table 4 in _{(1)}.

While we are far from predicting the emergence of biodiversity patterns on Earth, *gen3sis *offers an open source tool able to simulate gradual changes influenced by multiple factors in constant interaction over a long period of time. This has the potential to advance knowledge in multiple, interdisciplinary research areas. *Gen3sis *is available as an R-package on CRAN along beginners’ tutorials, in order to facilitate use, dialog and support of other scientists to piece together key puzzles of the Earth’s astonishing biodiversity. Available on github under GPL3, *gen3sis *inspires to provide open model development inside a critical and varied community. For this, you are more than welcome to join!

I thank Florian Hartig, Laura Méndez and Emma Ladouceur for comments and feedbacks.

- short historical perspective see monography introduction (~25min read)
- another blog post commenting on gen3sis (~15min read)
- R-package github and CRAN repository

1. O. Hagen, B. Flück, F. Fopp, J. S. Cabral, F. Hartig, M. Pontarp, T. F. Rangel, L. Pellissier, gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth’s biodiversity. *PLOS Biol.* **19**, e3001340 (2021).

2. M. Pontarp, L. Bunnefeld, J. S. Cabral, R. S. Etienne, S. A. Fritz, R. Gillespie, C. H. Graham, O. Hagen, F. Hartig, S. Huang, R. Jansson, O. Maliet, T. Munkemuller, L. Pellissier, T. F. Rangel, D. Storch, T. Wiegand, A. H. Hurlbert, The latitudinal diversity gradient: Novel understanding through mechanistic eco-evolutionary models. *Trends Ecol. Evol.* **34**, 211–223 (2019).

3. M. Weisberg, *Simulation and Similarity: Using Models to Understand the World* (OUP USA, 2013; https://books.google.de/books?id=rDu5e532mIoC), *Oxford Studies in Philosophy of Science*.

4. O. Hagen, T. Stadler, TreeSimGM: Simulating phylogenetic trees under general Bellman-Harris models with lineage-specific shifts of speciation and extinction in R. *Methods Ecol Evol*. **9**, 754–760 (2018).

5. O. Hagen, K. Hartmann, M. Steel, T. Stadler, Age-dependent speciation can explain the shape of empirical phylogenies. *Syst. Biol.* **64**, 432–440 (2015).

6. F. Leprieur, P. Descombes, T. Gaboriau, P. F. Cowman, V. Parravicini, M. Kulbicki, C. J. Melian, C. N. de Santana, C. Heine, D. Mouillot, D. R. Bellwood, L. Pellissier, Plate tectonics drive tropical reef biodiversity dynamics. *Nat. Commun.* **7**, 11461 (2016).

7. P. Descombes, F. Leprieur, C. Albouy, C. Heine, L. Pellissier, Spatial imprints of plate tectonics on extant richness of terrestrial vertebrates. *J. Biogeogr.* **44**, 1185–1197 (2017).

At the time, there was a heavy backlash against the study, and probably rightly so, as the statistical analysis turns out to be highly unstable against a change of the regression formula. You can find some links here. Over the years, however, I have found that this study has at least one virtue: it is an excellent example for teaching students about the importance of selecting the right functional relationship when running an analysis, and that substantial “dark” uncertainty can arise from these researcher degrees of freedom.

The reason why the hurricanes make such an excellent pedagogical example is that, as I point out here, the effect of femininity is highly unstable and depends strongly on which predictors you select, presumably because of high collinearity, the considered interaction(s) and the unbalanced femininity / mortality distribution.

In the stats course that I just finished teaching, I gave the students the task to re-analyze the hurricane data, which also led me to run some DHARMa residuals checks on the original negative binomial model fitted by Jung et al. Here is the residual analysis of the model with DHARMa, for technical reasons fit with glmmTMB and not with mgcv (as in the original study). The main DHARMa residual plot shows a kind of funky pattern, but those are not flagged as significant by the tests:

If we plot residuals against NDAM, however, we get a clear and very significant misfit. The original model is obviously not acceptable, and the student teams that did the re-analysis practically all spotted this immediately. Serves as a reminder of how efficient systematic residual checks for GLMMs are. In the defense of the authors: DHARMa was not available at the time, although this pattern was also visible in standard Pearson residuals, as pointed out by Bob O’Hara at the time.

We also find (light) spatial autocorrelation, but with a negative lag 1. One may speculate that this could arise if people are more careful after a particularly deadly hurricane in the last year, but it is equally possible that this is a fluke / false positve.

If you want to repeat the residual analysis, here’s the code. The data is conveniently stored by PNAS.

]]>Overdispersion is a common problem in GL(M)Ms with fixed dispersion, such as Poisson or binomial GLMs. Here an explanation from the DHARMa vignette:

GL(M)Ms often display over/underdispersion, which means that residual variance is larger/smaller than expected under the fitted model. This phenomenon is most common for GLM families with constant (fixed) dispersion, in particular for Poisson and binomial models, but it can also occur in GLM families that adjust the variance (such as the beta or negative binomial) when distribution assumptions are violated.

The main issue with overdispersion is that

- p-values tend to be too small, thus leading to inflated Type I error
- CIs will be to small, thus leading to overconfidence about the precision of the estimate

Several R packages, notably DHARMa, allow testing GL(M)Ms for overdispersion. For version 0.3.4, we added a new parametric dispersion test, and we also recently ran a large number of additional comparative analysis on their power in different situation. The gist of this work is: the tests are pretty good at picking up on dispersion problems in a range of models.

But are those tests good enough? Or are they maybe too good, meaning that: if you get a significant dispersion test, should you switch to a variable dispersion glm (e.g. neg. binom), even if the dispersion problem is mild? To do a quick check of this question, I ran a few simulations using the DHARMa::runBenchmarks function.

```
library(DHARMa)
returnStatistics <- function(control = 1){
testData = createData(sampleSize = 200, family = poisson(),
overdispersion = control, fixedEffects = 0,
randomEffectVariance = 0)
fittedModel <- glm(observedResponse ~ Environment1, data = testData, family = poisson())
x = summary(fittedModel)
res <- simulateResiduals(fittedModel = fittedModel, n = 250)
out <- c("Type I error GLM slope" = x$coefficients[2,4],
"DHARMa testDispersion" = testDispersion(res, plot = FALSE)$p.value)
return(out)
}
out = runBenchmarks(returnStatistics, controlValues = seq(0, 1.5, 0.05), nRep = 500)
plot(out, xlab = "Added dispersion sd", ylab = "Prop significant", main = "n = 200")
```

The idea of these simulations is to slowly increase the overdispersion in a Poisson GLM where the true slope of a tested effect (Environment1) is zero. As overdispersion increases, we will get increasing Type I error (false positives) on the slope. We can then compare the Power of the DHARMa::testDispersion function with the rate of Type I errors, to see if DHARMa would have warned us early enough about the problem, or if the dispersion tests are maybe warning too early, i.e. before calibration issues in the p-values of the GLM get serious (see Fig. 1, calculated for different sampleSizes n)

The results suggest that the power of DHARMa overdispersion tests depends more strongly on sample size than the increase of Type I error caused by the overdispersion. More precisely, for small sample sizes (n = 10/40), overdispersion tests pick up a signal roughly at the same point where overdispersion starts to create problems in the GLM estimates (i.e. false positives in regression effects).

For larger sample sizes (in particular n = 5000), however, even small levels of overdispersion are being picked up by DHARMa, whereas the GLM type I error is surprisingly unimpressed by the sample size. I have to say I was a bit surprised about the latter behaviour, and still do not fully understand it. It seems that the increase of type I error in a Poisson GLM mainly depends on the nominal dispersion and not so much on the sample size. Please comment if you have any idea about why this would be the case, I would have expected sample size to play a role as well.

Whatever the reason for the GLM behaviour, my conclusions (disclaimer: this is of course all only for a simple Poisson GLM, one should check if this generalises to other models) are as follows:

- In my simulations, problems with overdispersion were only substantial if a) tests are significant and b) the dispersion parameter is large, say e.g. > 2.
- This suggests that one could ignore significant overdispersion tests in DHARMa if n is large AND the estimated dispersion parameter is close to 1 (with the idea that the tests gest very sensitive for large n, thus picking up on minimal dispersion problems that are unproblematic for the GLMs)
- That being said: given that n is large, it seems no problem to support an additional parameter for adjusting the dispersion, e.g. through a negative binomial, so the question is if there is any reason to make this call

My overall recommendation would still be to move to a variable dispersion glm as soon as DHARMa dispersion tests flag overdispersion. But if you have particular reasons for avoiding this, you could ignore a positive test if n is large and the dispersion is close to 1.

**Edit 25.3:** in response to a question by **Furchtk**, I have made some more simulations varying the intercept of the Poisson (Fig. 2). What we can clearly see is that lower intercepts behave a bit similar to lower n, which is to be expected, as the integer stochasticity of the Poisson increases towards lower means. I’m not sure that I see anything else happening here, but again, probably one would have to check more systematically. It is a good point though that we should see n in relation to the mean as well, i.e. if we have a Poisson mean of 0.01, n=20 means a different thing than if the mean is 10.

**Update April 2020:** this paper has now been published in Nature, with a comment by Mark Pagel. From skimming the published version, it seems to me that the text has been a bit condensed, and that the implications were possibly a bit toned down, but I believe that the comments here largely remain valid for the published version as well.

**Update July 2020:** Helene Morlon and Stéphane Robin and I have formulated a more in-depth analysis of the article, which is available here.

Consider the following analysis task, which is arguably one of the most important in macroevolutionary research:

- We have a time-calibrated phylogeny for all extant species of a clade, but no information about the extinctions that presumably happened during its diversification from the last common ancestor to the present day.
- We want to fit statistical models (so-called birth-death models) to draw inference about speciation rates (birth = b) and extinction rates (death = d), and how those rates changed over time, so we are looking to infer b(t), d(t).

Let’s stay with the assumption of constant birth / death rates b,d for the moment. It may be surprising that it is indeed possible to simultaneously infer b and d from an extant tree. Surprising, because one might think that an increase in d could always be counteracted by increasing b to arrive at the same number of extant species, which would naively render b,d, unidentifiable. However, the model with constant birth-death rates is identifiable, although the uncertainty regarding the difference b–d is generally much lower than that of d/b, i.e. it is easier to estimate an effective diversification rate b-d, than the precise values of d/b (Nee, 2006), a result that we also find in (Maliet et al., 2019).

My intuition about this was (up to now) the following: yes, b/d trade off with regard to the final number of species, but combinations of larger b/d that produce the same number of final species will create more variation within the phylogeny, which makes it possible separate the parameters. Although, after reading the paper, I have to say that conceit that the reason is maybe a different one. Anyway.

Macroevolutionary analysis does of course not stop at constant b/d models. A big interest of the field is to understand how diversification rates change over time, e.g. to examine the effect of environmental conditions, key innovations etc. on speciation and extinction rates. A large range of statistical models have been proposed and fit that allow time or environment to affect speciation or extinction rates (Condamine et al., 2013), or that allow shifts in diversification rates at some points in time or for some clades (e.g. Rabosky et al., 2014).

Against this background, the main claim of Louca & Pennel is that

- The likelihood of a given diversification model depends only on the lineage-through-time plot (i.e. the diversity through time is a sufficient statistic for this type of problem)
- Asymptotically (i.e. for many species), the LTT plot can be modeled by a set of differential equations, which describe the temporal change dM/dt of the number of species in the LTT. Analysis of this equation shows that a large family of functions d(t), m(t) can produce the same M(t), i.e. the same LTT plot.
- Thus, if we assume that birth and death rates can be arbitrary functions of time, it’s not possible to simultaneously identify d(t), m(t). Rather, there are multiple diversification histories that will produce the same LTT. Louca & Pennel propose thus to only consider an effective diversification parameter (what they call the pulled speciation rate), which is identifiable, but will map on multiple, possibly quite different d(t), m(t) combinations.

For constant m/d, the pulled diversification rate will not be constant, but given by a differential equation. Because this is really the central point of the paper, I copy the part of the paper in full.

Analysis of these equations reveals that very different b(t), d(t) models can have identical pulled speciation / diversification rates and thus produce identical LTTs.

Louca & Pennel also address the question of why no one has noticed this before (I’m sure they got the same question from the reviewers). They argue that most common models that are fit to data specify functions for d(t), m(t) that will only intersect with one value of the pulled speciation rate. A visualisation of this idea is provided below:

A first, somewhat tangential comment, is about claim 1, which defines the scope of the paper: Louca & Pennel consider models for which diversification and extinction rates are functions of time, or some other variable that acts uniformly across the phylogeny. It is true that this is the assumption of many models, but there are other important models where b/d rates differ between lineages / subclades rather than time. My feeling was that if the arguments in Louca & Pennel have merit (more on that below), it should be possible to generalize them also to more general conditions (such as those we consider in Maliet et al., 2019). In any case, the point that the likelihood depends only on the LTT plot (or M(t)) seems to me more like a simplifying assumption than a result.

The main question, however, is clearly about points 2 and 3 above (i.e. fact that it’s not possible to distinguish between quite different diversification histories), which obviously have profound implications for macroevolutionary analysis. I would like to approach this claim from two sides

- is the proof that leads to the claim that models with identical pulled speciation rate have the same likelihood correct?
- if so, how much does that matter, given that most current models seem to be identifiable

I’ll be brief here: I don’t know. The proof looks overall convincing to me, except for one concern, which is that Louca & Pennel first consider asymptotics (to express M(t) as a smooth differentiable function), and then deriving the likelihood based on this smooth M(t). This feels a bit like switching limits in a mathematical series. Or in other words: we are first making the LTT smooth by taking the limit of n->infinity, and then calculate the likelihood on this smooth LTT, whereas strictly speaking, we should first calculate the likelihood, and then take the limit for n. My concern is that there might be local variation in the LTT that contains information for the inference, but that is now hidden by the fact that we take the limit of tree size to infinity first. I had always thought (see my comments above) that the differences in stochasticity of different b,d combinations are at least in part responsible for their identifiability. It seems to me that Louca & Pennel suggest that this is indeed not so, and that the different shape of the LTT is the only reason for identifiability.

However, I’m not sure that this is in fact an issue, and the idea of taking this limit goes back to at least Morlon et al., 2011. Still, I guess I’d simply like to convince myself with some very thorough, large number of replicate simulations, that there is no information hidden in the stochasticity. Maybe someone with more insight on this has thoughts / comments?

Let’s say the proof holds. The question then is how much this matters for the field and the existing methods. Louca & Pennel conceit that most models that are current fit are identifiable. I would argue that this shows that the situation is maybe not as bleak as they suggest, in the sense that what most people have so far found worth testing is testable.

Especially when we consider that the fact that arbitrary b(t), d(t) functions are not identifiable is not surprising, just from counting degrees of freedom. If we have M branching events, we can never fit a model with a change in d(t), m(t) at each branching event. Such a model would be desperately over-parameterized. Just algebraically, we can only hope to fit a model with a change in d(t), m(t) at every second branching event. If we add stochasticity to the equation, e.g. the rule that you typically need 10x the data to constrain 1df, we would arrive at a rule of thumb of requiring around 20 branching events for every degree of freedom in the functions specifying d(t), m(t) (with strong trade-offs between d/m in the likelihood, possibly more). These back-of-the-envelope calculations suggest to me that, for a 100 species phylogeny, we could anyway only hope to fit 1-3 parameters for b(t), d(t) each. This is probably not complex enough to produce the shapes in Fig. 1.

So, effectively, if all the things that we could have hoped for anyway are doable, is it really so important that we can’t distinguish between a large number of crazy scenarios? Don’t get me wrong, if the proof of Louca & Pennel is right, I think it’s a useful point, and the differences between models that (supposedly) lead to the same likelihood is quite impressive. I’m just wondering if there is any difference for the typical analyses in the field, that are run over small clades and model complexity is limited by data anyway. And for large clades, there are many other assumptions that are probably violated by the b(t), d(t) model, including that diversification rates are homogenous across subclades, and independent between lineages.

A final point: Louca & Pennel suggest that inference should concentrate essentially on the pulled speciation and diversification rates, which define the “congruent sets” of diversification scenarios that are compatible with a LTT. I didn’t get what we gain by that. In the end, this is just a re-transformation of the LTT plot that is hard to interpret. The insight that a problem is overparameterized or unidentifiable would suggest to me that we have to think harder about how to make it fitable, e.g. by reducing the number of parameters in the model, or add regularization on the parameters (as we did, e.g., in Maliet et al., 2019, where we fit a change in diversification rate at each time step with a regularization that assumes that diversification rates tend to stay similar between time steps). So, if the results hold, what I would take from the paper is that macroecology has to think hard (possibly harder than before) either about about specific candidate mechanisms and hypotheses, whose predictions can then be contrasted with the data, or about statistical priors, regularisation or null assumptions that make the problem identifiable.

Louca, S., Pennell, M.W., 2019. Phylogenies of extant species are consistent with an infinite array of diversification histories. bioRxiv 719435. https://doi.org/10.1101/719435

Maliet, O., Hartig, F., Morlon, H., 2019. A model with many small shifts for estimating species-specific diversification rates. Nat. Ecol. Evol. 3, 1086–1092. https://doi.org/10.1038/s41559-019-0908-0

Pontarp, M., Bunnefeld, L., Cabral, J.S., Etienne, R.S., Fritz, S.A., Gillespie, R., Graham, C.H., Hagen, O., Hartig, F., Huang, S., Jansson, R., Maliet, O., Münkemüller, T., Pellissier, L., Rangel, T.F., Storch, D., Wiegand, T., Hurlbert, A.H., 2018. The Latitudinal Diversity Gradient: Novel Understanding through Mechanistic Eco-evolutionary Models. Trends Ecol. Evol.

Nee, S., 2006. Birth-Death Models in Macroevolution. Annu. Rev. Ecol. Evol. Syst. 37, 1–17. https://doi.org/10.1146/annurev.ecolsys.37.091305.110035

Condamine, F.L., Rolland, J., Morlon, H., 2013. Macroevolutionary perspectives to environmental change. Ecol. Lett. 16, 72–85.

Rabosky, Daniel L., et al. “BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees.” *Methods in Ecology and Evolution* 5.7 (2014): 701-707.

H. Morlon, T. L. Parsons, J. B. Plotkin, Reconciling molecular phylogenies with the fossil record. Proceedings of the National Academy of Sciences 108, 16327–16332 (2011).

]]>BioScience has just published the latest installment of “Scientists’ Warnings“. There have been two previous such Warnings, the latest organised by the same authors in 2017. Quite a few scientists have signed this Warning. I chose not to, although I had signed the previous one in 2017.

I have been hackled, by a colleague from the Economics department, why I don’t rush to present and justify my research activities to the public. Parttaking in societal deliberations about climate change is one such “outreach”. He implied that it is “irresponsible” to work in an ivory tower (although it may actually be more an ivory basement). Reading the repeated Scientists’ Warning, I got a better feeling for why I disagree, and why I didn’t sign this time round. And it has nothing to do with whether I agree, as a private person, with the statement (for the record: I do).

In Neal Stephenson’s book Anathem, scientist are separated from the rest of the world and live in cloisters. They work in different casts, if you like, differentiated by how often they have contact with the outer, secular world: every year, every ten years, every century and every millennium. In between, they receive no mail, no books, no information from the world outside (apart from hearing planes flying over and from discussions with their brothers and sisters in the other casts, who are forbidden to touch any topic topical and current in these conservations). As a result, the “uniarians” discuss and work on issues of near-term, almost immediate nature, while in the extreme the millennarians take the long view.

The explosion of human population size and the resulting devastation that we human inflict on ourselves and the plant (climate change, deforestation, desertification, water pollution, you name it) poses a challenge to those ecologists who sympathise with a 100- or 1000-year view of their research. I sometimes half-jokingly refer to science as that bit of research that is still true in 500 years. That over 11,000 scientists signed the last Warning is perceived as a very strong statement by the “general public” (or so my non-scientific friends tell me). The strength comes from the fact that scientists, by and large, are perceived as impartial, rational and as taking the long view.

I decided not to sign the latest Scientists’ Warning, because my long (or at least mid-term) view is currently extremely clouded. The cacophony of current affairs, media outbursts, scientific and funding rush to Climate Change and (loss of) Biodiversity pushes past reflexion, arguments and understanding. I perceive an increasing proportion of the work in my field to be tainted by advocacy and short-termism. I cringe at oversimplified podium statements of do-gooders of my own discipline, at newspaper interviews and podcasts, for example describing building dams as “destroying biodiversity” because several hectars of riparian forest are lost. (Here the term “biodiversity” is used as synonymous with “nature” or “wild stuff”, not in any of its already too vague actual meanings.) Asked something “simple”, such as “How can we decrease chemical contamination of our environment?”, stuff that we teach in the Bachelor programme, I drop my gaze and stare at my shoes: this is not the right question; this is about moral judgement, about societal values, about political attitude. But these are “short-term views”, and, in my above definition, not science. (The scientific answer is obvious, even to the layperson asking.)

So, for the time being, as a scientist I pull out of street marches, petitions, twitter tirades (well, that was easy) and public calls for “them” to do “something” against climate change and and insect decline. My (private, but science-infused) longer-term view identifies overpopulation, slack in social norms and socially encouraged egotism (“Get rich or die trying”) as underlying problems. As a scientist, I am not qualified to comment on this.

]]>One example are the **validity concepts** (e.g. construct validity, internal validity, external validity) that are heavily taught in the social sciences and economics (see Wikipedia). In short, these concepts categorize “failure modes” of inference. I would guess that ecologists are aware about these problems on an abstract level, but they are not taught and used as a framework in the community. I find this regrettable, as my experience is that the “validity checklist” is immensely helpful for students to examine the strength of the evidence provided by a study.

Another example is causal inference, and specifically the concept of **mediators, confounders and colliders.** This goes back at least to Pearl 2000 (see also Pearl 2009a,b), and with the popularity of SEMs in ecology, I’m sure that people have at least heard about causal inference in general. However, when reading **the really excellent and highly recommended paper Lederer et al., 2019** *“Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals.”* in our group seminar, we also discussed that the practical implementation of these ideas is probably not very far in ecology (and seems to be further in medicine, given that this is a joint paper of the editors of this journal).

Lederer et al. first nicely establish an **operational concept of causality** that I would broadly agree with also for ecology: assume we look at the effect of a target variable (something that could be manipulated = predictor) on another variable (the outcome = response) in the presence of other (non-target) variables. The goal of a causal analysis is is to control for these other variables, in such a way that we obtain the same effect size for the target variable that we would if the target predictor was manipulated in a controlled **intervention** (= experiment).

You probably learned in your intro stats class that, to infer causal effects, we have to control for confounders. I am less sure, however, if everyone is clear about what a confounder is. In particular, **c****onfounding is more specific than having a variable that correlates with predictor and response**. The direction is crucial to identify true confounders. For example, Fig. 1 C from the Lederer paper shows a collider, i.e. a variable that is influenced by predictor and response. Although it correlates with predictor and response, correcting for it (or including it) in a multiple regression will create a **collider bias** on the causal link we are interested in (corollary: **including all variables is not always a good thing**). The bottomline of this discussions (and the essence of Pearl 2000, 2009) is that to establish causality for a specific link, we have to close the so-called back-door paths for this link, by

- Controlling for confounders (back-doors, blue paths in the figure)
- Not controlling for colliders, M-Bias, and other similar relationships (red paths)
- It depends on the question whether we should control for mediators (yellow paths)

My impression is that these type of arguments are well-established in the medical and economic literature (in the sense that people regularly use them to defend inclusion / exclusion of variables in a regression), but that they are rarely invoked in the ecological literature, where the selection of predictors happens by an eclectic selection of methods, ranging from AIC selection to “ecological plausibility”, but rarely based on formal causality arguments.

Moreover, what I really liked about the Lederer paper is their discussion of the **Table 2 fallacy.** The paper recommends that variables included as confounders should NOT be discussed and not be presented in the regression table at all (this is typically Table 2 in a paper, thus the name), because they are themselves usually not corrected for confounding (and they shouldn’t or at least don’t have to be corrected for, see Pearl 2000 / discussion above). Sensible advice, but I think contrary to common practice in standard and SEM regression reporting in ecology.

A cynical (but possibly accurate) explanation for why the Table 2 fallacy is the norm in ecology is that we rarely have a clear target variable / hypothesis, and thus we feel all variables that were used have to be discussed. A side effect is that this makes for the most boring result / discussion sections, where the effect of one variable after the other has to be discussed an interpreted. More importantly, however, each variable that is discussed as a causal effect must be controlled for confounding, or else we should make a clear distinction between the variables that are controlled, and those that aren’t. As I said, Lederer recommend not mentioning uncontrolled variables at all. I’m not sure if that is practical for ecology (as analyses are often semi-explorative), but I have recently been wondering about the option to separate reasonably controlled from possibly confounded variables by a bar or extra section in the regression table.

My only small quibble with the otherwise excellent Lederer paper relates to their comments about significance. First, I strongly support their call for concentrating on parameters and CIs instead of p-values. However, I find their recommendation to avoid the word “not significant” in favor of a vague term such as “the estimate is imprecise” a bad one (this is btw. similar to some other recent papers, e.g. Dushoff et al., 2019, Amrhein et al, 2019, which would make a nice topic for another post). The idea behind this recommendation is that researchers tend to misinterpret n.s. as “no effect”, but it seems to me the response should be to better educate researchers about what n.s. means, not to muddy the waters by hiding the fact that a test was done.

Lederer, D. J., Bell, S. C., Branson, R. D., Chalmers, J. D., Marshall, R., Maslove, D. M., … & Stewart, P. W. (2019) Control of confounding and reporting of results in causal inference studies. Guidance for authors from editors of respiratory, sleep, and critical care journals. *Annals of the American Thoracic Society*, *16*(1), 22-28.

Pearl, J. (2009) Causal inference in statistics: An overview. *Statistics surveys* 3, 96-146.

Pearl, J. (2000 / 2009) *Causality*. Cambridge University Press, 1st / 2nd ed.

Dushoff, J., Kain, M.P. and Bolker, B.M., 2019. I can see clearly now: reinterpreting statistical significance. *Methods in Ecology and Evolution*.

Amrhein, V., Greenland, S. and McShane, B., 2019. Scientists rise up against statistical significance.

]]>The relationship between species richness and ecosystem function is a field of ecology that has always puzzled me. I learned the scientific rope in a department of vegetation ecologists: vegetation was the result of environmental conditions, and indeed a substantial part of their research was to quantify what a plant species indicates about its environment (think Ellenberg indicator values). While of course species may be absent from a community due to competition, those species that *are* there reflect climate, soil, management.

Thus, when I see a paper showing a **strong** effect of species richness, I feel that there must be something amiss. (This paranoid and blanket scepticism goes far beyond “biodiversity” effects.) Can it really be true that in a give-or-take “natural” system we can boost productivity by 100-200% by having more species? Looking out of my office window, I can make out the Black Forest, and a nice large monoculture of spruce. Will adding *a random local tree species* increase the productivity? And does a mixture of, say, beech and spruce with a higher productivity demonstrate a TSR effect on P?

Actually, this blog post is an appetizer for our re-analysis of Liang et al. (2016, Science). But bear with me for another brief excursion. Let me first repeat an argument I read in Donald Maier’s scathing critique of “biodiversity research” (Maier 2012: “What’s So Good About Biodiversity?”, Springer): When we plot species richness on the x-axis, we assume that the species we count are equivalent. If they weren’t, their number is not helpful, and we should quantify something else, e.g. a trait or their abundance or their composition; but not their *number*. And, when investigating the effect of TSR, the x-axis implies random species composition. If it wasn’t random, then richness would be confounded with something else. (Admittedly Maier put it better, but also more verbose.)

Liang et al. (2017, Science: “Positive biodiversity-productivity relationship predominant in global forest”) present such a figure, with an increase in productivity on from around 3.5 to well over 10 m^{3}ha^{-1}yr^{-1}, as “relative species richness” increases from little to 100% on the x-axis. Such a figure rings my alarm bells. So, together with two BSc students, we re-analysed the data presented in that paper.

There are various points that we consider problematic (be it extremely unrealistic values for P; Euclidean distances between plots on a spherical world; non-stratified sampling of biomes; computation of “bootstrapped” error bars), and we investigated them one by one, but the pivotal point is the x-axis: What does “relative species richness” mean? Quite simply, it is the number of tree species in a plot divided by 270, the highest species richness in the data set considered. (Now that is a tiny bit unfair, but it is essentially what it is. In the rundown of the re-analysis we of course use Liang et al.’s definition.) So, a 10-species plot in Finnland receives a value of 3%, while a plot in Panama gets a value of 100%. Can you spot the problem? Yes: the TSR gradient is in fact a latitudinal gradient. That, in turn, means that the plot does not depict the effect of TSR on P, but of latitude on P!

We were still charmed by the idea of constructing an x-axis that is relative. Instead of “relative to the highest richness in the tropics”, however, we constructed a tree-species richness relative to the highest number of tree species observed *in that region*. So 100% means “as many as you can get around here”, and varies between 5 tree species in Siberia and 500 in Panama.

Using this definition (and stratifying by biome, and correcting for spatial distances on a sphere, and using subsampling correction for error bars) we find — **nothing**. (A tincy effect to the eye indistinguishable from a horizontal line.)

Of course, when looking at each biome separately, we find more or less positive effects, but never as strong as in the original global analysis.

Interested? Read more in our preprint on bioRxiv here!

What to take home? Well, perhaps that observational data are tricky for estimating richness effects. It’s so easy to miss effects and then wrongly attribute changes in productivity to species richness (And yes, I include Duffy et al.’s meta-analysis 2017 in this criticism; it’s part of my paranoid scepticism).

]]>Artificial neural networks, especially deep neural networks and (deep) convolutions neural networks, have become increasingly popular in recent years, dominating most machine learning competitions since the early 2010’s (for reviews about DNN and (D)CNNs see LeCun, Bengio, & Hinton, 2015). In ecology, there are a large number of potential applications for these methods, for example image recognition, analysis of acoustic signals, or any other type of classification tasks for which large datasets are available.

Fig. 1 shows the principle of a DNN – we have a number of input features (predictor variables) that are connected to one or several outputs through several hidden layers of “neurons”. The different layers are connected, so that a large value in a previous layer will create corresponding values in the next, depending on the strength of the connection. The latter is learned / trained by adjusting connections / weights to produce a good fit on the training data.

So, how does one build these kind of models in R? A particularly convenient way is the Keras implementation for R, available since September 2017. Keras is essentially a high-level wrapper that makes the use of other machine learning frameworks more convenient. Tensorflow, theano, or CNTK can be used as backend. As a result, we can create an ANN with n hidden layers in a few lines of code.

As an example, here a deep neural networks, fitted on the iris data set (the data consists of three iris species classes, each with 50 samples of four describing features). We scale the input variables to range (0,1) and “one hot” (=dummy features) encode the response variable. In the output layer, we define three nodes, for each class one. We use the softmax activation function to normalize the output for each node and the ∑ of outputs to range 0,1. For a evaluation of the model quality, keras will split the data in a training and a validation set. The code in Keras is as follows:

library(keras) | |

use_session_with_seed(1,disable_parallel_cpu = FALSE) | |

data = iris[sample(nrow(iris)),] | |

y = data[, "Species"] | |

x = data[,1:4] | |

# scale to [0,1] | |

x = as.matrix(apply(x, 2, function(x) (x-min(x))/(max(x) - min(x)))) | |

# one hot encode classes / create DummyFeatures | |

levels(y) = 1:length(y) | |

y = to_categorical(as.integer(y) - 1 , num_classes = 3) | |

# create sequential model | |

model = keras_model_sequential() | |

# add layers, first layer needs input dimension | |

model %>% | |

layer_dense(input_shape = ncol(x), units = 10, activation = "relu") %>% | |

layer_dense(units = 10, activation = "relu") %>% | |

layer_dense(units = 3, activation = "softmax") | |

# add a loss function and optimizer | |

model %>% | |

compile( | |

loss = "categorical_crossentropy", | |

optimizer = "adagrad", | |

metrics = "accuracy" | |

) | |

# fit model with our training data set, training will be done for 200 times data set | |

fit = model %>% | |

fit( | |

x = x, | |

y = y, | |

shuffle = T, | |

batch_size = 5, | |

validation_split = 0.3, | |

epochs = 200 | |

) | |

plot(fit) |

A common concern in this type of networks is overfitting (error on test data deviates considerably from training error). We want our model to achieve a high generalization (low test error). There are several ways for regularization, such as introducing weight penalties (e.g. L1, L2), early stopping, weight decay.

The dropout method is one simple and efficient way to regularize our model. Dropout means that nodes and their connections will be randomly dropped with probability p during training. This way an ensemble of thinned sub networks will be trained and averaged for predictions (see Srivastava et. al., 2014 for a detailed explanation).

use_session_with_seed(1,disable_parallel_cpu = FALSE) | |

model = keras_model_sequential() | |

model %>% | |

layer_dense(input_shape = ncol(x), units = 10, activation = "relu") %>% | |

layer_dropout(0.2) %>% | |

layer_dense(units = 10, activation = "relu") %>% | |

layer_dropout(0.2) %>% | |

layer_dense(units = 3, activation = "softmax") | |

model %>% | |

compile( | |

loss = "categorical_crossentropy", | |

optimizer = "adagrad", | |

metrics = "accuracy" | |

) | |

fit = model %>% | |

fit( | |

x = x, | |

y = y, | |

shuffle = T, | |

validation_split = 0.3, | |

epochs = 200, | |

batch_size = 5 | |

) | |

plot(fit) |

There is no overall rule for how to set the network architecture (depth and width of layers). In general, the optimization gets harder with the depth of the network. Network parameters can be tuned, but be are of overfitting (i.e. implement an outer cross-validation).

So, what have we gained? In this case, we have applied the methods to a very simple example only, so benefits are limited. In general, however, DNNs are particularly useful where we have large datasets, and complex dependencies that cannot be fit with simpler, traditional statistical models.

The disadvantage is that we end up with a “black box model” that can predict, but is hard to interpret for inference. This topic has often named as one of the main problems of machine learning, and there is much research on new frameworks to address this issue (e.g. DALEX, lime, see also Staniak, M., & Biecek, P. (2018))

By Betteridge’s law, the answer to this question is of course no. Or better: we don’t know. But let’s back up a bit:

Almost a year ago, LaManna and coauthors published a paper in *Science* (1), claiming that conspecific negative density dependence (CNDD) in forests, defined as the effect of local conspecific adult density on the recruit-to-adult ratio in 10x10m and 20x20m quadrats, increases toward the tropics and for rare species.

The strength and clarity of the identified effects was astonishing (at least to us), as were the implicated consequences: both in the original *Science* paper and in their press releases (i, ii), the authors interpret their results as suggesting that CNDD controls species abundance and diversity distributions, thus explaining causally why some species are rare and some are common, and why there is a latitudinal diversity gradient. They repeat these statements on youtube:

In a Technical Comment, published today in *Science *(2), we suggest an alternative, albeit somewhat less glamorous explanation for the results: the statistical CNDD estimators used in LaManna et al. were severely biased. And the strength of the bias depended on species abundance, and several other process and community characteristics that potentially correlate with latitude (Fig. 1, more details in our comment, see also our code on GitHub here). Because of this dependence, all the patterns reported in the original publication can emerge even when no CNDD is present whatsoever. We conclude that the methods used in LaManna et al. cannot even reliably detect the mere presence of CNDD, let alone any of the reported differences in CNDD with latitude or species abundance.

*Science* published a second technical comment by Ryan Chisholm and Tak Fung along with our comment, which reports similar results (Ryan also wrote a blog post about their study here). Moreover, we heard informally that Matteo Detto and colleagues had submitted another comment that was, however, not accepted for publication. We invited both to give a short summary of their conclusions regarding the study:

By Ryan Chisholm: In Chisholm and Fung (3), we show in more detail why the bias arises. LaManna et al. used an unusual “statistical trick”, whereby they transformed some data points but not others prior to model fitting, in order to account for the presence of quadrats with saplings but no adults. This “selective transformation” affected more data points in tropical than in temperate plots, which ultimately led to a greater bias in CNDD estimates in tropical plots and an artefactual latitudinal gradient in CNDD. A second statistical problem with the model was the lack of an intercept term, even though an intercept term was clearly suggested by the data and biologically is needed to account for immigration. After identifying the source of the bias, we performed a more appropriate statistical analysis, which does not use a “selective transformation” and includes an intercept in the model, and, on the same data, found no statistically detectable latitudinal trend in CNDD.

By Matteo Detto: I simulated a spatial neutral model where individuals reproduce and displace their offspring according to Gaussian dispersal and saplings become adults without interacting with neighbors. Both the within site pattern (the rare species bias) and the between sites pattern (the latitudinal gradient) produced by the neutral model were similar to the original patterns presented in LaManna et al., suggesting again that the patterns reported in LaManna et al. may be solely a result of a biased statistical estimator (Fig. 2).

We did not see the response by LaManna et al. [to us, to C&F] before yesterday. If we had seen it before, we would have been happy to point out a few errors and misrepresentations of our arguments, in particular

- The fact that the statistical method for estimating CNDD used in LaManna et al. is biased is a mathematically irrefutable fact (see above / our analysis). LaManna still seem to have problems to grasp that reality when stating wrt our null simulations “Some of these simulations produce spuriously strong CNDD for rare species, leading them to
**suggest**that our methods**might**be biased.” (emphasis our own). We do not know how they define bias, but in our book, a method is biased if it produces wrong estimates in reasonable situations. Everyone that doubts that this is the case is welcome to run our code –~~unfortunately, the reverse is not true, because the code by LaManna et al. is again not made available by the authors~~[Edit: 27.5.18 – it seem the code has now been made available here]. - The only question is how severe the bias is in the specific situation of this paper, and if anything else than bias is responsible for the results. We agree that this question is more difficult to answer, but the arguments brought forward by LaManna to defend the existence of a real signal are not convincing. For example, they state “If this [the bias] were correct, then our estimates of CNDD would be biased toward stronger effects for rare species at any latitude”, completely disregarding a whole paragraph in our comment and even a sentence in our abstract where we explain that a number of processes and factors (including the number of rare species) affects the bias, and that any of these processes might (and in the case of rare species certainly does) change with latitude, which explains why the bias may change with latitude.
- In everything that follows, LaManna et al. conveniently disregard any of the other processes that we have shown to create bias, concentrating entirely on dispersal. Doing so, they first misrepresent how we simulated dispersal, stating “That is why analyses that assume global dispersal, as in Hülsmann and Hartig, underestimate or fail to detect CNDD when it is actually present”, before graciously admitting that we also considered non-global dispersal. This argument is double wrong, first because we did not assume global dispersal, except for a single simulation where we varied the dispersal parameter from zero to global, and secondly, because what they state is exactly the opposite of what we found (under global dispersal, we ALWAYS find CNDD, regardless of whether it is present or not, so there is no way we could “fail to detect CNDD”).
- Going on about dispersal, LaManna et al. suggest that a different dispersal kernel would be more appropriate. We agree that their new kernel corresponds better to measured ecological dispersal kernels, but a) the dispersal kernel we used is (in terms of shape) the dispersal kernel they used in the simulations of their original Science paper, so it is surprising that they are so critical of this choice, and b) given our simulations (see also results by Matteo Detto above), we doubt that the change of the kernel significantly changes our conclusions. However, we will have to look at this in more detail.
~~Unfortunately, data and code for reproducing their results is again not made available by the authors, and the description of the model in the text is certainly not sufficient to reproduce their results~~[Edit: 27.5.18 – it seem the code has now been made available here].

In conclusion, reading all comments and the responses by LaManna et al., we see no reason to revise our statements that

- The statistical methods used in this paper are severely biased, and it is certainly suspicious that the bias creates pattern in null models that look very similar to the reported results
- We wouldn’t know how to properly correct this bias, but we found none of the arguments or simulations of the authors convincing to rule out the hypothesis that all of the presented patterns are caused by processes and factors other than CNDD, in combination with the context-dependent bias.

As a last point: even if the claimed correlation could be more convincingly demonstrated, we think one should be careful about claims of causality between CNDD and large-scale diversity patterns. For example, temperature could be both a cause for higher diversity (via productivity) and stronger importance of pathogen control (CNDD) in the tropics. In such a scenario, both CNDD and diversity might appear to be causally linked, but the correlation is indeed only caused by another process that both affects CNDD and diversity. Therefore, while we think that local CNDD (if it exists) likely has strong effects on local community structure and abundance, in particular spatial patterns, we would be hesitant to postulate that this scales up, i.e. that local CNDD is a major factor for relative abundance at scales > 50m.

**Site note on data / code availability**

*Science *states that the journal aims at increasing the “transparency regarding the evidence on which conclusions are based”, including open data and code, but neither the code, nor the data for the study were deposited at *Science* or another independent data repository. After several emails with the authors, we were able to obtain parts of the code, but not the data. The authors referred us to exiting data sharing agreements with (mostly) their coauthors, which did not allow them to pass on the data and would have required us to request each single dataset with the responsible PI. In the end, we only used the BCI dataset, which was already available to us. We think journals should make stronger efforts to enforce that code and data is deposited in appropriate, permanent repositories. Even if data is not fully open, there should be a mechanism to make data available for reproducibility checks upon request, for example through appropriate data use agreements that must be confirmed prior to access.

**References**

- A. LaManna et al.,
*Science*356, 1389–1392 (2017) - L. Hülsmann & Hartig, F.
*Science*eaar2435 (2018) - A. Chisholm & Fung, T.
*Science*eaar4685 (2018)

]]>